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THE  DEVELOPMENT  OF  ADVANCED  SENSOR  TECHNOLOGIES  TO 
MEASURE  CRITICAL  NAVY  MOBILITY  FUEL  PROPERTIES 


1.0  SUMMARY 

Current  mobility  fuel  acceptance  is  performed  in  the  shipboard  fuel  laboratory  with  a  series 
of  traditional  fuel  measurements.  A  fully  automated,  sensor-based  analytical  capability  would 
offer  significant  savings  in  manpower  and  consumables,  as  well  as  provide  for  a  safer,  more 
rapid,  and  possibly  more  accurate  analysis.  Laboratory  studies  have  thus  been  undertaken  at  the 
Naval  Research  Laboratory  to  develop  chemometric  methodologies  and  to  assess  measurement 
technologies  that  will  enable  the  implementation  of  sensor-based  instrumentation  capable  of 
measuring  critical  Navy  mobility  fuel  properties. 

One-dimensional  and  multi-way  chemometric  methods  were  developed  to  characterize  trace 
level  compositional  changes  occurring  in  fuels  during  aging  or  thermal  degradation  from  gas 
chromatography  (GC)  and  combined  gas  chromatography-mass  spectrometry  (GC-MS)  analyses. 
Multi-way  analysis  of  GC-MS  data  was  also  shown  to  be  an  extremely  sensitive  and  effective 
method  for  elucidating  the  composition  of  trace  fuel  contaminants.  These  chemometric 
techniques  were  then  used  to  evaluate  several  chromatographic  and  spectroscopic  methods  for 
their  efficacy  in  modeling  critical  fuel  properties. 

The  preliminary  findings  from  a  small  training  set  consisting  of  46  jet  fuels  from  around  the 
world,  demonstrated  the  feasibility  of  performing  quality  assurance  testing  of  shipboard  fuels 
from  either  GC,  near-IR  (NIR)  or  Raman  spectroscopy.  GC  offered  certain  advantages  for  some 
properties,  and  a  data  fusion  approach  in  which  both  chromatography  and  spectroscopy  are 
combined  into  one  model  may  provide  the  means  with  which  to  confirm  the  presence  of  required 
fuel  additives.  In  many  cases,  the  errors  of  prediction  from  partial  least  squares  (PLS)  regressions 
were  within  the  published  errors  of  the  standard  ASTM  test  methods  currently  employed. 
However,  in  order  to  attain  the  necessary  robustness  of  the  predictive  models,  many  more 
samples  will  need  to  be  incoiporated  into  the  training  set.  Once  a  sufficient  number  of  training 
set  samples  have  been  analyzed,  the  resulting  property  models  can  be  incoiporated  into  a  stand¬ 
alone  software  application  for  evaluation  by  other  Navy  fuel  laboratories.  These  models  would 
form  the  basis  for  the  design  of  a  prototype  sensor-based  “black  box”  device  to  replace  the 
current  ASTM  shipboard  fuel  quality  acceptance  test  procedures. 


2.0  INTRODUCTION 

Hydrocarbon  fuels  are  complex  mixtures  of  organic  compounds  that  are  manufactured  to 
comply  with  performance  specifications  based  on  properties,  and  not  on  composition.  Thus  while 
it  is  true  that  fuels  obtained  in  conformance  with  a  particular  specification  may  be  similar  in  their 
physical  properties  and  performance,  their  composition  may  differ  in  both  obvious  and  subtle 
ways.  These  compositional  differences  are  a  consequence  of  many  factors  which  include  refining 
and  finishing  methods,  crude  sources,  handling  methods,  contamination,  and  blending  with  other 
fuels.  Not  only  does  the  chemical  composition  of  each  fuel  differ,  but  the  composition  of  any 
particular  fuel  can  also  change  with  time.  It  is  usually  the  formation  of  insoluble  reaction 
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products  or  changes  in  a  critical  property  that  bring  fuel  stability  into  question.  Very  often,  the 
chemical  processes  leading  to  stability  problems  are  due  to  the  presence  or  absence  of 
constituents  at  trace  concentration  levels. 

Considering  the  complexity  and  variability  of  fuels,  it  is  not  surprising  that  success  has  often 
eluded  researchers  in  their  efforts  to  understand  and  provide  practical  solutions  to  operational 
problems  attributed  to  undesirable  fuel  chemistry.  As  a  consequence,  research  in  fuel  science  has 
tended  to  be  inductive  in  nature.  Unfortunately,  the  methodologies  adopted  have  all  too  often 
focused  on  producing  and  quantifying  insoluble  reaction  products,  since  they  are  the  most  easily 
quantifiable  manifestation  of  chemical  reactivity  in  fuel.  The  majority  of  laboratory  test  methods 
to  predict  the  tendency  of  fuels  to  undergo  deleterious  changes  during  storage  or  under  thermal 
stress  are  based  on  the  assumption  that  all  the  reaction  rates  in  a  complex  mixture  will  always 
double  in  a  synchronized  fashion  for  each  10°C  increase  in  temperature.  The  Ahrennius  law  has 
thus  been  often  employed  to  produce  detectable  quantities  of  insoluble  products  in  laboratory 
testing  within  practical  time  periods.  While  the  Arrhenius  law  is  applicable  for  pure  and  simple 
systems,  reliance  on  thermal  acceleration  of  deposition  from  a  complex  mixture,  such  as  a  fuel, 
can  often  yield  irrelevant  results.  The  multiplicity  of  sequential  and  parallel  reaction  pathways 
that  become  available  as  different  activation  energy  requirements  are  met  at  different 
temperatures  can  permit  chemical  processes  to  occur  that  are  not  necessarily  possible  under 
conditions  of  use.  Thus,  while  the  repeatability  of  many  laboratory  fuel  test  devices  and 
methodologies  can  be  very  good,  in  many  instances  the  relevance  to  actual  conditions  of  use  is 
very  limited,  if  not  absent.  Therefore,  it’s  not  always  realistic  to  attach  any  mechanistic 
significance  to  quantities  of  insoluble  products  formed  in  order  to  define  relationships  between 
composition  and  liquid-phase  chemistry. 

While  there  are  limits  placed  on  certain  fuel  constituents,  a  fuel’s  quality  and  suitability  for 
use  is  based  on  a  series  of  physical  and  chemical  measurements.  These  measurements  are 
performed  in  accordance  with  accepted  test  procedures  contained  in  the  applicable  ASTM  test 
methods1.  The  NATOPS  Aircraft  Refueling  Manual2  requires  that  aviation  fuel  (JP-5)  received 
by  ships  for  aircraft  fueling,  be  tested  for  API  gravity,  flash  point,  particulates,  fuel  system  icing 
inhibitor  (FS1I),  and  free  water.  During  flight  operations,  the  fuel  that  is  dispensed  to  aircraft 
must  be  tested  each  day  for  appearance,  particulates,  free  water  and  FSII.  All  of  these  tests  are 
performed  in  the  shipboard  QA  fuel  laboratory,  which  requires  significant  manpower  and  time. 
The  necessity  of  sampling  and  transporting  fuel  samples  from  the  source  to  the  fuel  lab  also 
entails  manpower  costs  and  safety  considerations.  In  addition,  the  ASTM  tests  that  are  employed 
require  that  the  analyst  be  trained  and  familiarized  with  the  fuel  test  methods. 

Shipboard  and  land-based  fuel  handling  operations  would  both  greatly  benefit  from  an 
instrumental  method  to  monitor  fuel  quality  and  to  perform  the  necessary  quality  assurance 
testing.  While  realizing  significant  savings  in  manpower  and  cost,  this  capability  would  also 
reduce  the  time  necessary  to  determine  fuel  quality.  A  sensor-based  fuel  diagnostics  capability 
would  significantly  reduce,  if  not  eliminate  the  safety  and  disposal  issues  associated  with 
laboratory  consumables  now  in  place.  If  this  technology  could  be  developed  to  include  in-line 
real  time  quality  monitoring,  this  would  be  invaluable  in  performing  fuel  quality  monitoring 
throughout  the  fuel  handling  system.  Such  a  system  could  be  automated  to  provide  continuous 


1  ASTM,  Specification  of  Aviation  Turbine  Fuels.  In  Annual  Book  of  ASTM  Standards',  ASTM: 
Philadelphia,  PA,  1997;  Vol.  05.01,  D1655-96c. 

2  “Aircraft  Refueling  NATOPS  Manual”,  NAY  AIR  Report  No.  00-80T-109,  15  Jun  2002. 


2 


real-time  fuel  quality  monitoring  for  both  shipboard  and  land-based  fuel  handling  and  distribution 
operations. 

A  sensor-based  fuel  quality  assessment  technology  would  rely  on  predictive  models  based  on 
mathematical  relationships  between  fuel  composition  and  stability.  The  science  of  chemometrics 
has  grown  from  a  need  to  identify  hidden  relationships  in  complex  data.  Chemometric  modeling 
offers  the  potential  for  rapid  analysis,  simultaneous  prediction  of  multiple  properties,  and  the 
ability  to  address  large  data  sets  automatically.  Moreover,  this  approach  would  not  necessarily 
entail  stressing  or  other  treatments  that  could  change  the  chemistry.  Recent  advances  in 
analytical  techniques,  computer  technology  and  the  science  of  chemometrics  have  made  it 
possible  to  bring  these  technologies  together  to  address  the  task  of  developing  a  useful  predicative 
model  correlating  fuel  stability  with  fuel  composition. 

Since  the  compositional  features  that  can  influence  critical  fuel  properties  are  typically 
obscured  by  an  abundance  of  irrelevant  information,  some  of  the  techniques  of  chemometrics 
would  seem  well  suited  to  reveal  hidden  information  in  complex  data  from  fuel  compositional 
analysis.  Several  research  groups  have  successfully  employed  chemometric  methods  to 
discriminate  between  different  fuel  types  from  GC  analyses.  In  1979,  Clark  and  Jurs’  described 
automated  classification  algorithms  that  successfully  characterized  GC  data  from  crude  oil 
samples  according  to  their  origin.  Neural  networks  have  also  been  employed3 4  to  develop 
multivariate  pattern  recognition  models  for  classifying  jet  fuel  samples  by  type  (e.g.,  JP-4  vs.  JP- 
5).  Lavine  and  coworkers5  noted  that  mathematical  pattern  recognition  methods  offer  a  better 
approach  to  GC  fuel  analysis  than  visual  inspection  due  to  the  complex  nature  of  processed  fuel 
samples.  They  demonstrated  that  pattern  recognition  methods  could  successfully  identify 
fingerprint  patterns  in  the  GC  data  that  were  characteristic  of  fuel  type,  even  in  the  presence  of 
severe  weathering.  In  each  of  these  applications,  the  information  needed  to  discriminate  the  fuel 
samples  consisted  of  subtle  variations  in  peak  intensities  distributed  across  multiple  peaks  in  the 
chromatograms.  In  order  to  improve  the  selectivity  and  predictive  power  of  principal  component 
analysis  of  GC  data,  Johnson  and  Synovec6  employed  analysis  of  variance  (ANOVA) 
calculations  to  restrict  the  analysis  to  only  those  features  in  the  data  that  were  relevant  to  the 
classification.  In  this  manner,  they  were  able  to  discriminate  between  JP-5,  JP-8  and  JP-TS  in 
mixtures  with  as  little  as  1%  variation  in  volume,  using  comprehensive  two  dimensional  GC  (GC- 
GC),  and  eliminate  the  impact  of  geographical  variances  in  the  jet  fuel  samples. 

Historically,  spectroscopic  characterization  of  fuels  has  been  a  mainstay  of  fuels  research  for 
over  60  years,  and  the  past  1 5  years  have  seen  a  surge  of  research  focused  on  the  development  of 
rigorous  calibration  models  that  correlate  the  compositional  information  contained  within 
spectroscopic  data  to  selected  fuel  quality  parameters.  Spectroscopic  methods  offer  a  number  of 
advantages,  including  the  relative  simplicity  of  instrumentation,  rapid  analysis  time,  and  high 
quality  of  the  data  from  a  chemometric  perspective.  Chemometric  multivariate  analysis  of 
spectroscopic  data  is  desirable  due  to  several  reasons.  Chief  among  them  is  the  so  called  first 
order  advantage,  providing  the  ability  to  recognize  the  presence  of  interferants  and  calibrate  in  the 
presence  of  known  interferants,  as  well  as  conferring  the  advantages  normally  provided  by  signal 
averaging.  Not  to  be  ignored,  however,  are  the  additional  potentials  of  rapid  analysis, 


3  Clark,  H.  A.;  Jurs,  P.  C.  Anal.  Chem.  1979,  51,  616-623. 

4  Long,  J.  R.;  Mayfield,  H.  T.;  Henley,  M.  V.;  Kromann,  P.  R.  Anal.  Chem.,  1991,  63,  1256- 
1261. 

5  Lavine,  B.  K.;  Mayfield,  H.  T.;  Kromann,  P.  R.;  Faruque,  A.  Anal.  Chem.,  1995,  67,  2846-3852. 

6  Johnson,  K.  J.;  Synovec,  R.  E.  J.  Chemometrics  and  Intel l.  Lab.  Systems,  2002,  60,  225. 
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simultaneous  prediction  of  multiple  properties,  and  the  ability  to  address  large  data  sets 
automatically. 


A  wide  variety  of  fuel  types,  ranging  from  gasoline  to  jet  and  diesel,  have  been  examined 
using  both  near  infrared7’8’9'10  and  Fourier  transform  infrared11  instruments  as  well  as  FT- 
Raman 121,14  instruments.  A  number  of  fuel  properties  have  thus  been  predicted  via  chemometric 
regression  of  spectroscopic  data,  including  octane/cetane  number15,  flash  point,  freeze  point, 
density,  viscosity,  sulfur  content16,  oxygenates  (such  as  MTBE  and  ethanol),  aromatic,  olefin,  and 
saturate  content,  distillation  fractions,  and  vapor  pressure.  Of  these,  the  correlation  of  octane 
number  to  N1R  spectra  has  been  the  most  widespread  and  extensively  developed  with  training 
sets  numbering  in  the  thousands,  resulting  in  numerous  commercially  available  octane  analyzers. 


Critical  Properties 

Flash  point 

Fuel  system  icing  inhibitor  concentration 

Dirt  (particulate)  content 

Water  content 

Density 

Desirable  Properties 

Aromatics  (di-aromatics  and  mono-aromatics 

Additives  (Betz,  SDA,  corrosion  inhibitors) 

Freeze  /  cloud  point 

Sulfur  content 

Viscosity 

Lubricity 

Table  1.  Fuel  properties  to  be  targeted  for  prediction  by  fuel  quality  sensors. 
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In  order  to  assure  adequate  fuel  quality,  the  shipboard  fuel  laboratory  must  be  capable  of 
measuring  the  fuel  properties  shown  in  Table  1.  Moreover,  any  new  technologies  put  in  place  to 
accomplish  this  must  be  capable  of  measuring  those  properties  with  the  same  or  better  precision 
than  current  methods.  The  findings  of  an  extensive  survey  conducted  in  FY04,  indicated  that 
there  were  no  suitable  commercially  available  solutions  to  measure  these  properties,  with  the 
possible  exception  of  an  optical  light  scattering  sensor  for  the  simultaneous  estimation  of  water 
and  particulates  that  is  being  developed  by  Pressure  Systems,  Inc.  (PSI),  and  currently  under 
evaluation  by  NAVAIR.  In  the  first  phase  of  this  study,  methodologies  for  chemometric 
modeling  of  chromatographic  data  were  developed  to  perform  sensitive  and  compound-specific 
fuel  diagnostics  via  GC  and  GC-MS  of  Navy  mobility  fuels.  In  the  second  phase  of  this  study, 
these  methodologies  were  extended  to  fuel  property  modeling  with  both  chromatographic  and 
spectroscopic  fuel  analysis  data.  The  initial  emphasis  was  placed  on  determining  the  suitability 
of  these  various  analytical  tools  to  measure  the  critical  fuel  properties  in  Table  1,  for 
implementation  in  a  sensor-based  analytical  system. 


3.0  FUEL  DIAGNOSTIC  AND  PROGNOSTIC  MODELING 

3.1  Prediction  Of  Fuel  Failures  In  A  Jet  Engine  Combustor 

An  example  of  the  discontinuity  that  can  exist  between  laboratory  testing  and  actual  engine 
use  was  provided  during  several  incidents  where  a  jet  engine  combustor  underwent  catastrophic 
failure  when  using  JP-5  fuel  from  a  particular  source.  In  this  instance,  the  fuel  that  caused  these 
failures  passed  all  the  standard  laboratory  tests  required  by  the  military  specification  MIL-T- 
5624.  In  addition,  a  suite  of  non-standard  tests  was  also  conducted  without  providing  an  answer 
to  the  fuel  dependent  engine  problem.  Chemical  analyses  did  not  reveal  the  presence  of  abnormal 
or  highly  reactive  constituents  that  could  have  been  identified  as  responsible  for  these  failures.  It 
became  apparent  that  the  compositional  uniqueness  of  this  fuel  responsible  for  the  combustor 
failures  was  too  subtle  to  be  revealed  by  a  straightforward  analytical  search  for  known 
constituents.  This  study  was  therefore  undertaken  in  order  to  determine  if  it  would  be  possible  to 
develop  a  correlation  model  that  could  predict  the  incidence  of  combustor  failure,  with  reasonable 
accuracy,  from  a  capillary  GC  analysis  performed  on  the  suspect  fuel  sample.  This  would 
provide  the  means  to  avoid  costly  full-scale  combustion  rig  testing  to  ensure  fuel  suitability  for 
this  particular  jet  engine. 


Experimental 

The  training  set  (fuel  set  #1)  was  comprised  of  15  JP-5  fuels  that  were  obtained  from  various 
sources,  and  verified  to  be  in  accordance  with  MIL-DTL-5624T.  The  origin  of  most  of  the 
samples  could  be  traced  back  to  one  particular  refinery,  where  the  failed  fuels  were  produced. 
Some  samples,  as  shown  in  Table  2,  were  obtained  from  the  distribution  system  and  some  from 
the  refinery  after  process  changes  were  implemented  in  an  effort  to  avoid  the  engine  failures. 
Samples  #2  and  #11  were  obtained  from  a  different  refinery,  and  sample  #14  was  taken  from  a 
long-term  storage  facility.  Fuel  sample  #8  was  also  obtained  from  the  distribution  system,  but 
was  of  uncertain  origin.  All  the  samples  in  the  training  set  had  been  tested  in  a  full-scale  jet 
engine  combustion  rig,  which  is  a  pass/fail  test  for  combustor  coking.  Test  samples  1,  3,  12  and 
14  had  failed  the  full-scale  combustion  test,  while  all  the  other  samples  had  passed.  A  second 
group  of  10  JP-5  fuels  (Fuel  Set  #2),  was  obtained  from  a  distribution  system  that  contained  fuel 
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from  refinery  #1.  These  samples  were  analyzed  and  classified  with  the  model  developed  from  the 
training  set,  (fuel  set  #1),  prior  to  combustion  testing. 

Five  replicate  chromatograms  were  obtained  with  an  Agilent  5890  capillary  gas 
chromatograph,  with  electronic  pressure  control,  controlled  via  an  HP  Chemstation.  Samples 
were  manually  injected  with  a  split/splitless  injector  at  250  °C,  with  a  ratio  of  60:1.  A  50  m  OV- 
101  (crosslinked  polysiloxane)  column  with  a  flame  ionization  detector  was  used.  A  column 
heating  profile  was  used  with  an  initial  oven  temperature  of  40  °C,  with  a  heating  rate  of  1 0  °C 
/min,  to  220  °C  held  for  2  minutes,  giving  a  total  run  time  of  20  minutes.  The  GC  detector 
current  was  sampled  at  a  rate  of  20  Hz,  giving  a  resolution  of  50  ms  per  point. 

A  measured  quantity  of  n-eicosine  (n-C2o)  was  added  to  each  sample  as  an  internal  standard 
to  compensate  for  variations  in  sample  injection  and  detector  response.  The  positions  of  the  GC 
peaks  in  all  the  samples  were  aligned  to  match  the  retention  time  of  the  internal  standard  in  one 
reference  chromatogram.  Peak  heights  were  also  normalized  using  the  internal  standard  peak 
from  the  same  reference  sample.  Before  beginning  data  analysis,  the  original  chromatographic 
data  set  of  75  chromatograms  and  16384  data  points  was  reduced  in  size  in  order  to  make  it  more 
manageable  by  computer.  Every  eighth  data  point  in  the  chromatograms  was  used  to  create  a 
reduced  data  set  (75  x  2048),  which  represented  a  chromatographic  sampling  frequency  of  2.5 
samples  per  second  (compared  to  20  samples  per  second  with  the  original  data).  The  data  were 
reorganized  so  that  replicate  chromatograms  were  ordered  consecutively. 


Sample 

No. 

Combuster 

Test 

Pass  /  Fail 

Fuel  Source 

1 

Fail 

Refinery  #1,  from  distribution  system 

2 

Pass 

Refinery  #2 

3 

Fail 

Refinery  #1,  from  distribution  system 

4 

Pass 

Refinery  #1,  process  varied,  sampled  at  refinery 

5 

Pass 

Refinery  #1,  process  varied,  sampled  at  refinery 

6 

Pass 

Refinery  #1,  from  distribution  system 

7 

Pass 

Refinery  #1,  from  distribution  system 

8 

Pass 

Unknown  source,  from  distribution  system 

9 

Pass 

Refinery  #1,  process  varied,  sampled  at  refinery 

10 

Pass 

Refinery  #1,  process  varied,  sampled  at  refinery 

11 

Pass 

Refinery  #3,  sampled  at  refinery 

12 

Fail 

Refinery  #1,  sampled  at  refinery 

13 

Pass 

Refinery  #1,  from  distribution  system 

14 

Fail 

Sampled  from  long-term  storage  facility 

15 

Pass 

Refinery  #1,  process  varied,  sampled  at  refinery 

Table  2.  Description  of  JP-5  fuels  used  to  develop  the  correlation  model  (fuel  set#l). 


6 


Results  and  Discussion 


In  order  to  investigate  the  similarities  in  the  fuel  samples,  a  non-linear  map  of  the  data  is 
shown  in  Figure  1.  Non-linear  mapping  provides  a  visual  means  of  displaying  the 
multidimensional  chromatographic  data  in  two  dimensions  by  preserving  the  inter-chromatogram 
distances.  In  interpreting  this  plot,  the  distances  between  the  chromatograms  is  a  measure  of  their 
mathematical  similarity  (e.g..  Euclidean  distance).  Thus,  two  chromatograms  located  near  each 
other  in  a  non-linear  map  have  similar  chromatographic  profiles.  Ideally,  one  would  like  to  see 
the  replicates  of  the  same  samples  clustered  together  and  the  samples  from  the  two  classes  of 
fuels  ordered  into  distinct  groups.  In  this  plot,  the  engine  failing  samples  are  1,  3,  12  and  14;  and 
all  others  are  engine  passing.  Figure  1  clearly  shows  that  samples  8  and  1 1  are  different  from  the 
other  samples,  due  to  the  tight  clustering  away  from  the  other  samples.  The  lack  of  distinct 
clustering  between  the  engine  passing  and  failing  classes  suggests  that  the  differences  between 
the  classes  are  minor.  The  small  distances  between  the  engine  passing  samples  and  failing 
samples  3  and  12  shows  that  these  two  samples  are  more  similar  to  the  passing  samples  than  the 
other  failing  samples  1  and  14. 


Figure  1:  Non-linear  map  of  chromatographic  data  from  15  different  jet  fuel  samples.  Samples 
1,  3,  12  and  14  caused  engine  failures. 


Due  to  the  placement  of  the  samples  8  and  1 1  in  the  nonlinear  map  further  outlier  analysis 
was  warranted.  Principal  components  analysis  (PCA)  is  a  multivariate  data  reduction  method  that 
provides  both  a  useful  visualization  tool  and  a  statistical  baseline  for  outlier  analysis.  PCA  finds 
combinations  of  variables,  or  factors,  that  describe  major  trends  in  the  data.  In  interpreting  this 
plot,  the  distances  between  the  points  is  a  measure  of  their  similarity.  Thus,  two  points  located 
near  each  other  in  a  PCA  plot  would  be  expected  to  have  similar  chromatographic  information. 
Ideally,  replicates  of  the  same  sample  would  be  clustered  together  and  the  samples  from  the  two 
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classes  of  fuels  (pass  or  fail)  would  be  ordered  into  distinct  groups.  Figure  2  is  a  plot  of  the  75 
chromatograms  projected  onto  their  first  two  PCs.  Overlaid  on  the  PCA  plot  is  the  99% 
confidence  interval  for  the  PC  scores.  The  five  chromatograms  that  lie  outside  of  this  sphere  all 
come  from  sample  11. 

Since  one  of  the  goals  of  this  project  is  to  develop  a  single  classification  model  that  can 
discriminate  good  fuels  from  bad  fuels,  regardless  of  the  source,  subsequent  analysis  was 
performed  using  data  sets  with  and  without  sample  #11.  The  decision  to  remove  an  outlier  from 
model  development  is  non-trivial,  and  it  is  advisable  to  include  as  many  samples  as  possible, 
particularly  in  this  case,  where  samples  were  limited.  Flowever,  severe  outliers  such  as  sample 
#1 1  can  skew  the  PCA  model  such  that  the  small  differences  necessary  for  distinguishing 
between  passing  and  failing  samples  would  be  masked.  Potential  causes  for  outliers  might  be 
samples  taken  from  different  sources,  jet  fuel  types  (e.g.,  JP-8),  or  major  changes  in  the 
instrumental  response.  Upon  visual  inspection  of  the  chromatogram,  it  is  clear  that  the  GC  of  fuel 
#11  was  distinctly  different  than  the  other  fuels  in  the  data  set.  Fuel  #11  was  the  only  fuel 
obtained  from  refinery  #3,  and  was  correctly  identified  as  an  outlier  in  the  analysis.  Thus,  the 
amount  of  variation  between  this  fuel  and  the  other  fuels  was  much  greater  than  the  differences 
between  the  fuels  from  the  same  source. 


Scores  on  Principal  Componentl  (43.6%) 

Figure  2:  Principle  component  analysis  of  chromatographic  data  from  15  jet  fuel  samples. 


Discriminant  partial-least-squares  regression  (DPLS)  was  used  to  classify  the  sample 
chromatograms  as  belonging  to  either  engine  passing  or  failing  fuel  samples.  DPLS  uses  a  linear 
approach  for  classification,  and  the  model  is  of  the  form 


(1) 


ci=b0+blxil+b2xia  +  ...  +  bnxin 

where  cl  is  the  predicted  class  for  sample  i,  the  b  terms  are  the  multivariate  regression 
coefficients,  and  the  xt  terms  are  the  detector  responses  for  the  i  =  1  to  n  data  points  (n  =  2048). 17 
Two  different  prediction  models  were  constructed;  data  set  1  contained  the  outlier  fuel  #11,  and 
this  fuel  outlier  was  removed  from  data  set  2,  and  two  methods  of  model  validation  were 
employed.  Since  multivariate  calibration  methods  are  prone  to  over  fitting  the  data,  they  must  be 
properly  validated.  Ideally,  validation  of  the  model  would  be  done  with  an  external  prediction  set 
consisting  of  data  withheld  from  the  data  used  to  develop  the  model,  i.e.,  the  training  set. 
However,  in  this  instance,  samples  were  at  a  premium  and  internal  validation  methods  such  as 
cross-validation  were  used.  The  first  validation  method,  leave-one-out  cross-validation,  involves 
removing  one  chromatogram  from  the  data  set  and  predicting  its  fuel  class  based  on  the 
remaining  74  or  69  chromatograms,  for  data  sets  1  and  2,  respectively.  This  is  repeated  until  each 
of  the  chromatograms  has  been  removed  and  predicted  once.  The  second  validation  method, 
contiguous  blocks  cross-validation,  involves  removing  all  five  of  the  replicate  chromatograms  for 
one  sample  from  the  data  set  and  using  the  remaining  chromatograms  to  predict  the  class  of  the 
five  removed.  This  is  repeated  until  each  of  the  samples  has  been  removed  from  the  data  set  and 
predicted  once.  The  contiguous  blocks  cross-validation  method  is  a  more  rigorous  method  since 
it  predicts  the  samples  versus  the  other  14  samples.  In  comparison,  leave-one-out  predicts  each 
chromatogram  against  the  other  14  samples  and  4  replicates  of  the  same  sample.  Typically,  the 
leave-one-out  cross-validation  can  easily  over-estimate  the  classification  ability  of  the  model 
when  a  large  number  of  replicates  and  a  small  number  of  samples  exist.  The  optimal  number  of 
latent  variables  or  PLS  factors  was  chosen  to  be  the  minimum  predicted  residuals  sum  of  squares 
(PRESS)  statistic  using  contiguous  blocks  cross-validation.  This  allows  for  enough  information 
to  be  retained  to  explain  the  model,  but  not  overfit  the  specific  data  in  the  training  set. 

A  third  data  set  was  prepared  by  performing  a  systematic  search  of  the  chromatographic  data 
in  order  to  find  specific  retention  time  windows  that  have  discrimination  power  superior  to  the 
full  chromatogram.  This  method,  known  as  interval-partial  least  squares  regression  (i-PLS),  was 
performed  on  data  set  2.  i-PLS  determined  that  the  optimum  data  range  for  discrimination  was 
between  data  channels  392  and  683,  resulting  in  a  75x292  data  matrix.  The  location  of  this 
chromatographic  window  in  relation  to  the  full  chromatogram  is  shown  in  Figure  3.  The  DPLS 
classification  results  are  given  in  Table  3.  In  the  Table,  methods  l-o-o  and  contiguous  represent 
leave-one-out  cross-validation  and  contiguous  blocks  cross-validation,  respectively.  The  Table 
shows  that  all  the  samples  are  classified  correctly  using  leave-one-out  cross-validation,  except  for 
two  replicate  chromatograms  for  sample  6  being  incorrectly  classified  with  data  set  3.  However, 
as  noted  above  the  contiguous  blocks  cross-validation  is  a  considerably  more  robust  validation. 


The  results  for  data  set  1  show  that  the  classification  performance  is  poor  with  only  62.6%  of 
the  samples  being  classified  correctly.  The  essentially  unchanged  results  for  data  set  2  shows  that 
the  removal  of  outlier  sample  1 1  does  not  affect  the  performance  of  the  classification  model. 
Significant  improvements  in  classification  performance  were  obtained  with  data  set  3.  These 
improvements  suggest  that  the  subtle  differences  in  the  chromatographic  profiles  in  this  retention 
time  window  may  correlate  with  a  fuels  performance  in  the  engine  test.  The  results  for  data  set  3, 
with  contiguous  blocks  cross-validation  are  also  shown  in  Figure  4.  This  Figure  shows  the 
predicted  class  output  c  for  each  of  the  samples,  except  for  sample  11.  In  this  Figure,  a  c  value  of 
greater  than  zero  represents  a  predicted  engine  passing  fuel  and  a  c  of  less  than  zero  represents  a 


17  Thomas,  E.  V.  Anal.  Chem .,  1992,  66,  795A-804A. 
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predicted  engine  failing  fuel.  Three  engine  failing  sample  chromatograms,  belonging  to  fuel 
samples  3  and  12  rise  above  the  line  c  =  0,  and  are  hence  misclassified  as  engine  passing  fuels. 
Four  engine  passing  sample  chromatograms,  from  fuel  samples  2  and  6,  fall  below  the  line  c  =  0, 
and  are  misclassified. 


Figure  3.  Range  of  data  channels  picked  by  interval-PLS.  (2.61  to  4.55  minutes) 


Data  Set 

Method 

Class:  Pass 
correct/n  (%) 

Class:  Fail 
correct/n  (%) 

Overall 
%  correct 

1 

l-o-o 

55/55  (100) 

20/20(100) 

100 

2 

1-0-0 

50/50  (100) 

20/20(100) 

100 

3 

1-0-0 

48/50  (96) 

20/20(100) 

97.14 

1 

contiguous 

41/55  (74.5) 

6/20  (30) 

62.6 

2 

contiguous 

37/50  (74) 

6/20  (30) 

61.4 

3 

contiguous 

46/50  (92) 

17/20  (85) 

90 

4 

DS3 

44/50  (88) 

NA 

88 

Table  3.  DPLS  predictions  by  the  model  for  the  training  sample  set,  using  two  methods  of 
validation.  Data  set  #1  is  all  the  fuel  samples,  and  data  set  #2,  with  the  outlier  fuel  #11,  removed. 
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A  blind  prediction  test  was  performed  on  a  distinctly  different  group  of  fuels  (data  set  #4), 
gas  chromatographic  data  from  a  set  of  ten  blind  fuel  samples,  were  analyzed  by  DPLS  to  further 
validate  the  model  performance.  Five  replicate  chromatograms  were  received  for  each  sample. 
The  chromatograms  were  reduced  in  size  to  the  same  region  chosen  by  the  i-PLS  model  resulting 
in  a  50x292  data  matrix.  A  new  DPLS  model  was  built  using  all  14  samples  from  data  set  #3.  As 
shown  in  Figure  5,  the  DPLS  model  classified  one  replicate  from  six  of  the  ten  fuel  samples  as 
failing,  with  the  remaining  44  of  the  50  samples  analyses  as  passing  (88%).  Subsequent 
combustion  testing  classified  all  the  samples  in  data  set  #4  as  passing. 
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Figure  4.  DPLS  cross-validation  results  from  data  set  #3,  contiguous  blocks  method 


Summary  of  Results 

The  results  of  this  preliminary  study  illustrate  how,  with  proper  preprocessing  methods, 
fuels  can  be  classified  with  respect  to  performance  on  the  basis  of  subtle  differences  in  fuel 
constituency.  The  findings  also  illustrate  the  impact  that  relatively  minor  variations  in  GC 
operating  conditions  can  exert  on  multivariate  correlations  when  the  analysis  is  focused  on  the 
minor  differences  between  samples.  While  it  is  important  to  consider  all  the  data,  proper 
preprocessing  of  the  raw  data  is  essential  in  order  to  focus  the  analysis  on  the  differences  between 
the  samples  without  undue  interference  from  those  fuel  constituents  common  to  all  fuels  and  to 
minimize  instrumental  variations  so  that  the  method  can  be  packaged  and  used  with  any  suitable 
instrumentation.  It  was  also  shown  that  the  discriminating  power  of  the  model  could  be  improved 
by  focusing  on  those  portions  of  the  data  that  have  greater  statistical  weight,  thus  reducing  the 
amount  of  extraneous  information  in  the  data  set. 
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In  this  study,  there  were  a  limited  number  of  failed  fuels  available,  and  the  levels  of 
uncertainty  reached  in  the  validation  samples  reflect  this.  These  variations  would  be  significantly 
lower  if  the  number  of  available  failed  fuels  were  greater.  However,  using  the  preliminary  model 
developed  with  this  data,  we  were  able  to  correctly  screen  a  group  of  fuels  that  passed  the 
combustor  test,  with  very  little  uncertainty. 
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Figure  5.  DPLS  prediction  results  from  jet  fuel  data  set  #2. 
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While  chemometrics  can  be  successfully  employed  to  discriminate  between  different  fuel 
types  or  sources,  there  are  practical  limits  on  use  of  gas  chromatography  with  fuels  that  have 
substantially  different  compositions.  In  order  to  develop  useful  correlations  between  composition 
and  performance  with  multivariate  analysis  of  fuels  from  widely  distributed  sources  and  crude 
slates,  the  methods  developed  in  this  study  could  be  focused,  through  interval  analyses,  on  the 
compounds  of  interest.  For  example,  if  a  particular  compound  class  was  identified  as  being 
significant,  the  fuel  could  be  preprocessed  and  a  model  developed  for  the  subset  of  all  fuel 
constituents  represented  by  that  compound  class.  However,  chemical  changes  that  can  lead  to 
undesirable  fuel  property  changes  are  generally  due  to  the  presence  of  chemically  active  fuel 
constituents,  such  as  heteroatomic  species  and  transition  metal  chelates.  Thus,  while  GC  with 
flame  ionization  detection  would  not  necessarily  respond  directly  to  metal  compounds  and 
perhaps  some  heteroatomic  species,  their  impact  on  hydrocarbon  constituency  would  be  detected 
by  this  method,  if  the  magnitude  of  the  changes  were  within  detection  limits. 
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3.2  Monitoring  Diesel  Fuel  Degradation  by  GC-MS  and  Chemometric  Analysis 


This  study  was  undertaken  in  order  to  determine  if  the  application  of  multi-way  chemometric 
techniques  to  a  3-dimensional  dataset,  such  as  GC-MS,  would  overcome  the  limitations  of  GC 
alone  in  characterizing  minute  chemical  changes  in  fuels,  regardless  of  source. 

Experimental 

Fuel  Stressing.  A  specification  naval  distillate  fuel  (NATO  F-76)  fuel  was  used  for  these 
studies.  In  storage  stability  testing  by  the  LPR  method  in  accordance18  with  ASTM  D5304,  this 
fuel  produced  0.7  mg  sediment  per  100  mL.  Two  methods  of  thermally  stressing  the  fuel  were 
used  in  order  to  provide  fuel  samples  with  varying  levels  of  thermal  degradation  for  analysis  by 
GC-MS.  In  the  first  study,  100  mL  of  fuel  was  placed  in  a  vented  250  mL  borosilicate  glass 
bottle  and  stored  in  an  oven,  in  the  dark  at  60  °C.  Aliquots  (1  mL)  were  periodically  withdrawn 
without  cooling  for  GC-MS  analysis  at  0,  7,  14,  24,  31,  and  37  days  of  oven  stress.  In  a  second 
study,  fuels  were  subjected  to  stress  in  a  closed,  low  pressure  reactor  (LPR).  LPR  stress  was 
performed  on  aliquots  of  fuel  in  accordance  with  ASTM  D5304  (i.e.,  at  a  temperature  of  100  °C 
and  under  an  atmosphere  of  100  psig  oxygen)  for  both  the  standard  duration  of  16  hours  and  also 
an  extended  duration  of  42  hours.  The  extended  duration  LPR  conditions  were  chosen  to 
guarantee  significant  oxidative  changes  in  the  fuel.  After  the  fuel  samples  were  removed  from 
the  LPR,  they  were  cooled  in  the  dark  and  2  mL  aliquots  were  filtered  through  a  0.2  pm  nylon 
Millipore  filter  for  analysis.  A  summary  of  the  LPR-stressed  samples  analyzed  by  GC-MS  is 
presented  in  Table  4. 

GC-MS  Analysis.  GC-MS  data  were  obtained  using  an  HP  5890  Series  II  gas  chromatograph 
coupled  to  an  HP  5971  mass  selective  detector.  Replicate  dilutions  were  prepared  and  analyzed 
for  each  fuel  sample:  three  replicates  per  oven-stressed  sample  and  six  replicates  per  LPR- 
stressed  sample.  Each  replicate  was  prepared  by  dissolving  7.5  pL  of  fuel  in  1500  pL 
dichloromethane.  An  HP  6890  injector  and  autosampler  delivered  1.0  pL  aliquots  of  each 
replicate  sample  to  the  GC  in  a  random  order.  A  split/splitless  injector  at  250  °C  with  a  split  flow 
ratio  of  60:1  was  used  along  with  a  50  m  x  0.2  mm  Agilent  HP-1  (dimethylpolysiloxane) 
capillary  column.  The  oven  temperature  profile  was  60  °C  to  288  °C  at  3  °C/min,  giving  a  run 
time  of  76  minutes.  A  solvent  delay  of  4.40  minutes  was  used  which  reduced  the  data  acquisition 
time  to  71.6  minutes  per  run. 

Chemometric  Analysis.  Following  acquisition,  GC-MS  chromatograms  were  preprocessed  to 
minimize  any  undesired  variation  between  chromatograms.  First,  all  chromatograms  were 
normalized  to  unit  area  to  minimize  effects  due  to  variation  in  injected  sample  volume.  Next,  the 
chromatograms  were  retention  time  aligned  in  order  to  minimize  retention  time  variation  between 
chromatograms  due  to  unavoidable  fluctuations  in  instrument  parameters  (e.g.  oven  temperature, 
flow  rate,  etc.)  during  the  course  of  the  experiments. 


18  ASTM,  “Standard  Test  Method  for  Assessing  Middle  Distillate  Fuel  Storage  Stability  by 
Oxygen  Overpressure”.  In  Annual  Book  of  ASTM  Standards;  ASTM:  Philadelphia,  2003;  Vol. 
05.03,  ASTM  D  5304-03. 
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Sample  Number 

Stress  Conditions 

1-6 

Unstressed 

7-11 

LPR,  1 6  hour 

12-17 

LPR,  42  hour 

18-23 

LPR,  42  hour,  Cu 

Table  4.  Summary  of  fuel  stress  conditions  imposed  on  the  23  samples  of  naval  distillate  fuel 
(NATO  F-76). 


Retention  time  alignment  for  these  experiments  was  accomplished  via  a  two-step  procedure. 
First,  a  total  ion  current  (TIC)  chromatogram  for  each  GC-MS  chromatogram  was  constructed  by 
taking  the  sum  along  the  mass  spectral  axis  in  the  data.  The  set  of  all  TIC  chromatograms  was 
then  subjected  to  a  manual  peak  alignment  procedure  where  corresponding  peaks  from 
chromatogram  to  chromatogram  were  visually  identified  and  tabulated.  Taking  the  first 
chromatogram  as  the  target  for  alignment,  the  required  shifts  for  the  remaining  chromatograms 
were  calculated.  In  the  second  step,  these  required  shifts  were  applied  to  the  retention  time  axis 
of  the  GC-MS  chromatograms  via  interpolation. 

Since  the  data  acquired  for  this  work  were  accumulated  over  a  relatively  short  time  period  on 
a  single  instrument,  the  instrumental  parameters  could  be  more  closely  controlled  than  is 
generally  possible  in  daily  routine  analyses.  Given  the  relative  stability  of  the  GC-MS  data 
acquired,  it  was  unnecessary  to  add  internal  standards  for  retention  time  alignment  and 
normalization  purposes  in  this  experiment.  In  general  practice  under  conditions  where  carrier  gas 
flow  rate  variations  are  encountered,  it  certainly  may  be  necessary  to  employ  internal  standards. 
In  previous  studies19,  we  have  successfully  corrected  non-linear  variations  in  gas  chromatography 
carrier  gas  flow  rates  by  aligning  chromatograms  within  portions  of  the  data  bracketed  by  the 
major  hydrocarbon  peaks.  Such  a  windowed  approach  to  baseline  peak  alignment  is  a  viable 
approach  to  minimizing  instrumental  variations  that  could  be  encountered  in  actual  use  and  when 
the  best  possible  quantitative  precision  is  required. 

Datasets  for  each  experiment  consisted  of  a  series  of  two-dimensional  GC-MS 
chromatograms,  one  for  each  sample  analyzed,  stacked  on  each  other  to  form  a  three-dimensional 
array,  or  cube  of  data.  These  data  sets  were  probed  for  chromatographic  features  that  described 
the  difference  between  different  fuel  blends  using  the  ANOVA  based  feature  selection  algorithm 
as  described  by  Johnson20,  and  implemented  by  an  in-house  program  written  for  MATLAB 
(Mathworks  Inc.,  Natick,  MA).  Feature  selection  was  performed  by  grouping  a  set  of  data  into 
defined  classes  and  looking  for  data  points  which  best  describe  the  differences  between  classes, 
while  remaining  the  same  within  a  given  class.  This  is  accomplished  by  performing  the  ANOVA 
f-ratio  calculations,  shown  in  equations  2-5,  for  each  data  point  in  the  two  dimensional  space  of 
the  GC-MS  data.  In  this  manner,  a  complex  data  set  was  rapidly  and  automatically  scanned  for 
features  important  for  a  given  classification.  Multivariate  models  of  GC-MS  data  were  then 


Morris,  R.  E.;  Hammond,  M.  H.;  Shaffer,  R.  E.;  Gardner,  W.  P.;  Rose-Pehrsson,  S.  L.  Energy > 
&  Fuels,  2004, 18,  485. 

20  Johnson,  K.  J.;  Synovec,  R.  E.  Chemom.  Intell.  Lab.  Sys.  2002,  60,  225-237. 
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constructed  to  describe  compositional  differences  between  fuel  samples.  Modeling  was 
accomplished  via  the  PLS  toolbox  and  the  Nway  toolbox21  (version  2.1).  Three  separate 
techniques  were  used  to  model  changes  in  fuel  composition,  multi-way  principal  components 
analysis  (MPCA),  parallel  factor  analysis  (PARAFAC),  and  multilinear  partial  least  squares 
regression  (N-PLS)22'23'24’25. 

MPCA  amounts  to  unfolding  each  two  dimensional  chromatogram  into  a  vector  of  data  and 
subjecting  the  resulting  set  of  vectors  to  standard  PCA26.  PCA  functions  by  constructing  a  basis 
set  of  orthogonal  vectors  that  most  efficiently  describe  the  variation  present  in  the  data  set,  known 
as  loadings,  and  the  projections  of  each  sample  onto  a  particular  loadings  vector  are  known  as 
scores.  Prior  to  application  of  MPCA,  the  GC-MS  data  in  this  work  was  subjected  to  an  eight- 
point  boxcar  average  along  the  chromatographic  dimension  of  the  data  to  reduce  the  size  of  the 
data  set  due  to  memory  limitations  of  the  personal  computer  on  which  the  analysis  was 
performed. 

PARAFAC  is  conceptually  similar  to  PCA,  except  that  it  functions  on  three-dimensional 
arrays.  Thus,  each  independent  factor  in  a  PARAFAC  model  consists  of  three  loadings  vectors, 
one  for  each  dimension  of  the  original  data  set  (i.e.,  retention  time,  ion  abundance  and  mass 
spectra).  The  loadings  vectors  generated  by  PARAFAC  are  calculated  in  an  iterative  fashion  to 
best  describe  the  overall  variation  in  the  data  set,  given  a  specified  number  of  factors  as  well  as 
any  constraints  that  may  be  imposed  on  any  of  the  modes,  such  as  non-negativity  or  unimodalitiy. 

The  advantage  of  PARAFAC  lies  in  the  incoiporation  of  a  third  dimension,  which  provides 
unique  solutions  to  the  factor  decomposition  problem.  Thus,  if  the  underlying  factors  fit  the 
PARAFAC  model,  concentration  and  spectral  features  can  be  extracted  directly.  According  to  the 
strength  and  weaknesses  of  each  technique,  bulk  changes  in  fuel  composition  (i.e.  changes 
reflected  in  the  overall  chromatogram)  were  modeled  with  MPCA  due  to  their  deviation  from 
trilinearity,  while  PARAFAC  was  used  to  model  smaller  local  regions  of  GC-MS  data  in  order  to 
extract  concentration  profiles  and  mass  spectra  of  pure  fuel  components. 

N-PLS"  ’  ’  ’  is  a  multi-way  generalization  of  the  commonly  used  partial  least  squares 
regression  algorithm  designed  to  be  used  with  second  or  higher  order  data  sets.  In  accordance 
with  the  philosophy  behind  first-order  PLS  regression,  the  algorithm  seeks  to  decompose  the  data 
into  a  factor  model  that  best  describes  the  covariance  between  the  dependant  (predicative)  and 
independent  (predicted)  variables.  This  model  is  then  applied  to  subsequently  measured  data  to 
make  quantitative  predictions.  The  chief  difference  between  PLS  and  N-PLS  is  that,  in  N-PLS, 
the  factor  model  is  trilinear  in  form  (as  with  PARAFAC).  N-PLS  has  previously  been  shown31'32 


21  Andersson,  C.  A.;  Bro,  R.  Chemom.  Intell.  Lab.  Syst.  2000,  52,  1-4. 

2~  Smilde,  A.  K.  Chemom.  Intell.  Lab.  Sys.  1992,  15,  143-157. 

23  Henrion,  R.  Chemom.  Intel l.  Lab.  Sys.  1994,  25,  1-23. 

24  Dahl,  K.  S.;  Piovoso,  M.  J.;  Kosanovich,  K.  A.  Chemom.  Intell.  Lab.  Sys.  1999,  46,  161-180. 

25  Bro,  R.  Chemom.  Intell.  Lab.  Sys.  1997,  38,  149-171. 

26  Malinowski,  E.  R.  "Factor  Analysis  in  Chemistry"  2nd  ed.,  John  Wiley  &  Sons,  New  York, 
1991. 

27  Bro,  R.  J.  Chemom.  1996,  10(1)  47-61. 

28  Smilde,  A.  K.  J.  Chemom.  1997,  11(5),  367-377. 

29  de  Jong,  S.  J.  Chemom.  1998, 12,  77-81. 

30  Bro,  R.;  Smilde,  A.  K.;  de  Jong,  S.  Chemom.  Intell.  Lab.  Syst.  2001,  55(1),  3-13. 

31  Johnson,  K.J.;  Prazen,  B.J.;  Young,  D.C.;  Synovec,  R.E.  J.  Sep.  Sci.  2004,  27,  410-416. 
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to  be  effective  in  the  quantification  of  multi-component  compositional  properties  of  both 
industrial  naphtha  and  fuel  samples  by  GC-GC. 
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Results  and  Discussion 


There  are  two  basic  premises  for  applying  chemometric  methods  to  instrumental  data.  The 
first  is  that  the  dataset  constitutes  an  accurate  numerical  representation  of  the  relevant  chemical 
information,  and  the  second  is  that  the  differences  between  the  different  classes  of  samples  are 
statistically  greater  than  the  differences  between  samples  within  the  same  class.  If  these  two 
criteria  are  met,  then  it  should  be  possible  to  develop  predictive  and  diagnostic  models  based  on 
these  numerical  representations  of  fuel  composition.  Moreover,  a  mathematical  discrimination  of 
all  the  significant  differences  in  fuel  composition  after  stress  could  reveal  subtle  features  highly 
correlated  with  fuel  quality  that  could  have  gone  otherwise  undetected. 


32  Prazen,  B.J.;  Johnson,  K.J.;  Synovec,  R.E.  Anal.  Chem.  2001,  73(23),  5677-5682,. 
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The  data  resulting  from  a  single  GC-MS  analysis  of  a  typical  unstressed  diesel  fuel  can  be 
depicted  in  a  two-dimensional  plot  as  shown  in  Figure  6,  which  is  a  topographical  contour  plot 
with  a  single  contour  line  drawn  at  a  value  of  signal  intensity  of  three  times  the  baseline  noise.  In 
this  Figure,  the  x-axis  is  retention  time  in  terms  of  mass  spectral  scan  number  and  the  y-axis  is  the 
mass  spectral  axis  and  encompasses  mass  to  charge  (m/z)  ratios  of  50  to  300.  Thus,  each  slice  of 
this  plot  taken  along  the  x-axis  represents  the  mass  spectrum  recorded  at  a  given  retention  time  in 
the  GC  separation.  A  data  set  resulting  from  multiple  GC-MS  analyses  can  thus  be 
conceptualized  as  a  series  of  two-dimensional  GC-MS  chromatograms  stacked  one  on  top  of 
another  to  form  a  cube.  Therefore,  in  addition  to  dimensions  describing  GC  retention  time  and 
MS  mass  to  charge  ratios,  such  a  data  set  also  has  a  third  dimension  reflecting  the  identity  of  the 
different  samples  subjected  to  GC-MS  analysis. 

An  initial  examination  of  the  GC-MS  data  indicated  that  the  chromatographic  retention  time 
precision  and  signal  reproducibility  were  relatively  high.  For  example,  the  data  collected  in  the 
study  of  LPR-stressed  fuels  exhibited  a  mean  standard  deviation  in  peak  position  of  2.7  scan 
numbers  across  286  peaks  in  the  18  chromatograms.  This  deviation  is  equal  to  only  roughly  one 
fourth  of  the  typical  peak  width  and,  thus,  manual  alignment  of  the  chromatographic  profiles  was 
readily  achieved.  Turning  to  signal  reproducibility,  an  examination  of  replicate  total  ion  current 
chromatograms  yielded  a  mean  RSD  of  9.6  percent  across  the  entire  chromatographic  run  for  the 
raw  data,  and  a  mean  RSD  of  3.3  percent  for  data  that  had  been  area  normalized  and  aligned. 

An  MPCA  model  was  constructed  from  data  acquired  in  the  analysis  of  the  oven  stressed  neat 
fuel  samples.  The  scores  on  the  first  principal  component  of  the  MPCA  model  are  shown  in 
Figure  7.  This  plot  indicates  that  there  is  a  clearly  defined  progressive  change  in  composition  as 
the  fuel  was  subjected  to  increasing  durations  of  oven  stress  at  60°C. 

The  loadings  vector  (reshaped  to  matrix  form)  associated  with  the  first  principal  component 
of  the  MPC  A  model  and  represents  the  portions  of  the  GC-MS  data  that  were  undergoing  changes 
during  oven  stress.  The  loadings  plot  thus  derived,  indicated  that  much  of  this  change  was 
associated  with  a  decrease  in  the  concentrations  of  early  eluting  (i.e.  high  volatility)  components, 
which  is  consistent  with  evaporative  loss  during  stress,  rather  than  chemical  changes  associated 
with  fuel  degradation.  Models  constructed  from  fuels  spiked  with  copper  demonstrated  similar 
behavior. 

The  extent  of  the  compositional  changes  associated  with  evaporative  loss  was  further 
examined  with  the  ANOVA  based  feature  selection  utility.20  The  f-ratios  from  this  analysis  are 
shown  in  Figure  8  as  a  contour  plot  drawn  with  a  single  contour  line  at  an  f-ratio  value  just  above 
baseline  noise.  The  variations  from  sample  to  sample  shown  for  this  data  set  in  Figure  7  are  thus 
dominated  by  decreasing  concentrations  of  early  eluting  components.  This  finding  indicates  that 
the  process  of  evaporative  loss,  rather  than  fuel  degradation  is  being  modeled.  To  further 
illustrate  this  point,  a  feature-selected  total  ion  current  (TIC)  chromatogram  was  constructed  by 
including  only  data  points  with  an  ANOVA  f-ratio  greater  than  80,  as  shown  in  Figure  9.  This 
feature-selected  TIC  thus  represents  only  those  fuel  components  that  had  changed  during  the  oven 
test.  When  compared  with  the  TIC  of  the  unstressed  fuel,  two  things  are  evident:  1)  the  ANOVA 
f-ratio  test  is  able  to  detect  subtle  changes  in  composition  from  GC-MS  data  and  2)  the  lighter 
components  were  being  lost  during  oven  testing. 
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Figure  6.  Typical  GC-MS  chromatogram  of  an  unstressed  diesel  fuel,  presented  as  a 
topographical  contour  plot  with  only  one  contour  line,  drawn  at  a  signal  intensity  of  three  times 
the  baseline  noise. 
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Figure  7.  MPCA  model  of  oven-stressed  fuel,  showing  a  clear  progression  in  the  scores  on  the 
first  principal  component  as  the  duration  of  oven  stress  at  60°C  is  increased. 
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Figure  8.  ANOVA  f-ratios  for  oven-stressed  fuel,  which  provide  a  measure  of  how  well  each 
data  point  describes  the  differences  as  oven  stress  increased. 


In  order  to  minimize  evaporative  losses,  a  second  study  was  conducted  in  which  neat  and 
copper-doped  fuel  samples  were  thermally  stressed  in  a  low  pressure  reactor  (LPR)  at  100 
degrees  Celsius  under  100  psi.  oxygen  for  42  hours.  Figure  10  depicts  ANOVA  f-ratios  for  the 
LPR-stressed  fuel  data  set.  The  f-ratio  calculation  was  made  by  defining  the  classes  of  samples 
as  unstressed,  1 6  hour  LPR  stressed  neat,  42  hour  stressed  neat,  and  42  hour  LPR  stressed  with 
Cu,  thus  providing  a  measure  of  how  well  each  data  point  in  the  GC-MS  chromatograms 
describes  the  difference  between  LPR  stressed  and  unstressed  fuels.  As  can  be  seen  in  Figure  10, 
evaporative  loss  does  not  dominate  the  changes  associated  with  LPR  stress  of  this  fuel,  as  they  do 
in  the  first  oven  stress  study. 

An  MPCA  analysis  of  the  LPR-stressed  fuel  data  set  was  conducted  and  the  scores  on  the 
first  principal  component  are  shown  in  Figure  1 1 .  The  first  six  samples  are  replicates  of 
unstressed  fuel,  the  next  five  were  1 6  hour  LPR-stressed  fuel,  the  next  six  42  hour  LPR-stressed 
fuel,  and  the  final  six  42  hour  LPR-stressed  fuel  with  added  copper.  From  this  plot,  there  is  a 
clear  differentiation  between  stressed  and  unstressed  fuels  (i.e.  intra-rep licate  variation  is  much 
less  than  inter-replicate  variation),  but  essentially  no  difference  between  the  comparison  of 
stressed  fuels  with  and  without  copper.  The  loadings  from  this  first  principal  component  of  the 
MPCA  model  agree  very  closely  with  the  features  defined  by  the  ANOVA  f-ratio  calculation,  and 
this  indicates  that  the  MPCA  model  is  describing  the  difference  between  stressed  and  unstressed 
fuels  without  the  need  for  feature  selection  prior  to  MPCA.  A  second  ANOVA  f-ratio  calculation 
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Figure  9.  TIC  of  unstressed  fuel  and  the  feature-selected  TIC  of  the  components  that  have  been 
altered  during  oven  stress. 
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was  performed,  this  time  using  only  the  42  hour  LPR-stressed  fuels  and  defining  two  classes: 
fuels  stressed  with  copper  and  fuels  stressed  neat.  This  analysis  failed  to  locate  any 
chromatographic  features  capable  of  making  this  classification.  This  suggests  that,  even  though 
copper  can  accelerate  the  rate  of  fuel  autoxidation,  in  this  instance,  the  presence  of  copper  had 
little  or  no  significant  impact  on  the  final  composition  of  the  LPR-stressed  fuel. 

The  feature-selected  TIC  displayed  in  Figure  12  was  computed  using  the  ANOVA  f-ratios 
shown  in  Figure  10  to  filter  the  chromatogram  of  an  unstressed  fuel  so  that  only  the 
chromatographic  features  that  differ  significantly  between  the  LPR-stressed  and  unstressed  fuels 
are  seen.  The  raw  (i.e.  non-feature  selected)  TIC  of  the  unstressed  fuel  is  also  shown,  for 
comparison.  In  this  instance,  the  feature  selected  TIC  was  calculated  by  including  only  data 
points  with  an  ANOVA  f-ratio  greater  than  100.  This  threshold  value  was  chosen  empirically, 
based  on  a  visual  inspection  of  all  of  the  calculated  ANOVA  f-ratios,  which  demonstrated  that 
ANOVA  f-ratios  in  the  baseline  areas  of  the  chromatogram  were  all  less  than  100.  A  more 
rigorous  optimization  of  this  threshold  was  not  undertaken  as  the  threshold  obtained  by  visual 
inspection  performed  adequately  for  the  purposes  of  this  study.  Figure  12B  shows  an  enlarged 
region  of  Figure  12A  and  demonstrates  the  degree  to  which  fuel  components  which  changed 
during  LPR  stress  were  intermingled  with  those  that  did  not  in  the  GC-MS  chromatogram. 

An  attempt  was  made  at  spectral  deconvolution  of  an  ANOVA-selected  feature  from  GC-MS 
data  of  LPR-stressed  fuel.  The  boxed  region  in  Figure  12B  was  subjected  to  PARAFAC 
decomposition  with  unimodality  and  nonnegativity  constraints  on  the  GC  mode,  and 
nonnegativity  constraint  on  the  mass  spectral  mode.  A  five-component  PARAFAC  model  was 
then  constructed  and  the  chromatographic  and  sample  mode  loadings  are  shown  in  Figure  13.  As 
seen  in  Figure  13A,  four  discrete  Gaussian  peaks,  as  well  as  a  fifth  peak  that  was  somewhat  less 
well-defined,  were  extracted  from  the  region  shown  in  Figure  12B.  One  of  this  set  of  four  peaks 


20 


250 


200 

N 

E 

150 


100 


50 

1000  2000  3000  4000  5000  6000 

Retention  Time  (scan  number) 

Figure  10.  ANOVA  f-ratios  for  changes  in  composition  of  LPR-stressed  fuels.  In  this  case, 
evaporative  losses  do  not  dominate  the  changes  associated  with  LPR  stress,  as  they  do  with  oven 
stress. 


Figure  11.  MPCA  scores  on  the  first  principal  component  of  LPR-stressed  fuel  showing  a  clear 
delineation  between  stressed  and  unstressed  fuels. 
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occurs  at  the  same  retention  time  as  the  desired  feature  located  by  the  ANOVA  feature  selection 
algorithm  and  is  drawn  with  a  solid  line  in  Figure  13B. 

Figure  13B  shows  the  concentration  profiles  of  these  extracted  components  by  sample 
number.  The  first  six  samples  are  replicates  of  unstressed  fuel,  the  next  five  were  1 6  hour  LPR- 
stressed  fuel,  the  next  six  42  hour  LPR-stressed  fuel,  and  the  final  six  42  hour  LPR-stressed  fuel 
with  added  copper.  It  can  be  seen  that  one  extracted  component  in  this  region  is  decreasing  in 
concentration  with  LPR  stress  while  the  others  remain  essentially  the  same.  This  component, 
depicted  again  with  a  solid  line,  corresponds  with  the  chromatographic  feature  located  by 
ANOVA  feature  selection. 

The  loadings  in  the  mass  spectral  mode  for  this  component  of  the  PARAFAC  model  are 
shown  in  Figure  14,  with  major  peaks  at  m/z  =  91,  115,  117,  131,  145,  and  160.  This  extracted 
mass  spectral  data  was  writing  the  data  to  an  appropriately  formatted  text  file,  which  was  then 
passed  to  the  NIST  Mass  Spectral  Search  Program  for  the  NIST/EPA/NIH  Mass  Spectral  Library 
(version  2.0)  which  then  returned  automated  library  search  results  for  the  submitted  data.  A 
molecular  structure  assignment  (5-ethyl- 1,2, 3, 4-tetrahydronapthalene)  was  obtained  via  the  NIST 
mass  spectral  library  for  the  PARAFAC  model  component  in  question.  This  illustrates  the 
potential  diagnostic  capability  of  this  approach  to  detect  and  characterize  those  trace  fuel 
constituents  that  have  changed  during  use  or  thermal  stress. 


Summary  of  Results 

When  employing  multidimensional  chemometric  techniques  to  perform  discriminant  analyses 
of  complex  fuel  composition  datasets,  it  is  imperative  that  measures  be  taken  to  avoid 
unintentional  weighting  of  results.  Thus,  we  have  to  be  very  circumspect  in  how  we  define  what 
constitutes  a  statistically  significant  change  in  composition,  while  avoiding  interpretations  that 
may  be  skewed  by  preconceived  ideas  of  which  chemical  processes  are  relevant.  Another  aspect 
of  this  methodology  that  is  critical  is  the  establishment  of  the  ability  of  the  numerical  data  to 
accurately  represent  the  magnitude  of  change,  i.e.,  the  linearity  of  instrumental  response  to 
compositional  change. 

The  ANOVA  analysis  has  been  shown  to  be  capable  of  extracting  the  significant  information 
from  complex  compositional  data.  It  was  clearly  shown  in  analysis  of  the  oven-stressed  diesel 
fuel  samples  that  the  chemical  variations  were  dominated  by  evaporative  loss  of  the  more  volatile 
fuel  components,  rather  than  those  that  are  lost  or  gained  due  to  fuel  degradation.  Utilization  of 
LPR  stress  conditions  minimized  this  effect,  and  allowed  for  a  better  examination  of  changes  in 
fuel  composition  due  to  degradation. 

Fuel  degradation  in  samples  subjected  to  LPR  stress  under  the  conditions  described  was 
readily  observed  and  modeled  as  changes  in  fuel  composition  as  monitored  by  GC-MS  analysis  of 
fuel  samples.  The  ANOVA  based  feature  selection  was  able  to  locate  the  features  that  changed 
from  sample  to  sample,  thus  allowing  for  a  quick  evaluation  of  how  fuel  composition  was  altered 
during  stress,  as  well  as  aiding  in  constructing  hypotheses  as  to  the  mechanisms  of  this  change. 
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Figure  12.  A)  Feature-selected  TIC  of  unstressed  fuel  from  the  LPR  stress  study  including  only 
data  points  with  an  ANOVA  f-ratio  greater  than  100.  Offset  from  the  feature  selected  TIC  is  the 
raw  (i.e.  non-feature  selected)  TIC  of  the  same  fuel  sample.  B)  Enlarged  region  of  feature 
selected  TIC. 
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Figure  13.  PARAFAC  decomposition  of  the  boxed  region  in  Figure  12B.  Unimodality  and 
nonnegativity  constraints  were  imposed  on  the  chromatographic  mode,  and  a  nonnegativity 
constraint  was  imposed  on  the  mass  spectral  mode.  A)  Resulting  deconvoluted  GC  profiles 
showing  four  discrete  Gaussian  peaks.  B)  Deconvoluted  concentration  profiles,  with  only  one 
component  decreasing  in  composition  with  stress. 
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Figure  14.  Mass  spectral  loadings  of  the  PARAFAC  model  component  that  decreased  in 
concentration  during  LPR  stress.  A  tentative  identification  of  this  compound  as  5-ethyl-l,2,3,4- 
tetrahydronapthalene  was  made  via  a  NIST  mass  spectral  library  match. 


In  this  experiment,  it  appears  that  this  diesel  fuel  stressed  with  and  without  high  levels  of 
copper  approached  the  same  chemical  composition  “end  point”  although  the  mechanism  and  rate 
of  change  may  have  been  different.  This  raises  questions  about  the  role  of  copper  in  fuel 
autoxidation,  and  it  serves  as  an  autoxidation  accelerant  without  imposing  significant  changes  in 
the  autoxidation  mechanism.  Such  a  finding  would  imply  that  copper  could  be  used  to  accelerate 
fuel  stability  testing  in  the  laboratory. 

Identification  of  a  number  of  individual  fuel  constituents  that  change  significantly  during 
LPR  stress  may  be  possible  through  PARAFAC  decomposition  of  local  regions  of  GC-MS  data 
that  have  been  identified  as  significant  by  ANOVA-based  feature  selection.  The  potential  of  this 
approach  is  as  a  diagnostic  tool,  as  well  as  a  means  of  more  completely  understanding  the 
complex  processes  that  occur  as  a  fuel  degrades.  This  type  of  analysis  could  eventually  provide 
the  means  to  characterize  compositional  changes  in  fuels  associated  with  degradation  to  an 
unprecedented  level  of  detail.  This  level  of  detail  and  understanding  of  the  fuel  degradation 
process  is  necessary  to  develop  reliable  and  robust  models  to  predict  fuel  quality  that  are 
functional  for  more  than  one  type  of  fuel.  Moreover,  by  incorporating  structure  assignment 
functionality  along  with  localized  PARAFAC  modeling  and  appropriate  data  preprocessing,  it 
would  theoretically  be  possible  to  provide  a  means  of  automating  the  process  of  analyzing  the 
GC-MS  datafiles  to  provide  rapid  compositional  profiles  for  evaluating  fuel  samples. 
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3.3  Characterization  of  Fuel  Blends  by  GC-MS  and  Chemometric  Tools 


This  study  was  undertaken  to  test  our  hypothesis  that  second  order  chemometric  tools  have  an 
even  greater  potential  to  characterize  and  quantify  fuel  components  in  complex  mixtures. 
Demonstrated  here  is  an  application  that  illustrates  the  potential  of  three  multi-way  chemometric 
analysis  tools  to  rapidly  and  effectively  characterize  the  composition  of  different  fuel  blends. 

Experimental 

Fuel  Blends.  In  the  first  experiment,  a  series  of  diesel  fuel  and  light  cycle  oil  (LCO)  blends 
were  examined.  The  blends  were  comprised  of  0,  1,  5,  10,  20  and  100  percent  LCO  by  volume, 
with  the  remainder  being  made  up  by  a  navy  specification  F-76  diesel  fuel  (DFM).  In  a  second 
experiment,  a  series  of  samples  consisting  of  one  neat  DFM  sample,  one  DFM  adulterated  with 
home  heating  oil  (HHO),  and  a  blend  of  equal  volumes  of  the  two  were  examined. 

GC-MS  Analysis.  Samples  for  GC-MS  analysis  were  prepared  by  diluting  2  pL  of  each 
sample  with  2  mL  dichloromethane.  An  HP  6890  injector  and  autosampler  delivered  1.0  pL 
aliquots  of  each  of  five  replicate  samples  in  random  order  to  an  Agilent  model  5890  capillary  gas 
chromatograph  coupled  to  a  HP  5971  mass  selective  detector.  A  split/splitless  injector  at  250°C 
with  a  split  flow  ratio  of  60:1  was  used  along  with  a  50  m  x  0.2  mm  Agilent  HP-1 
(dimethylpolysiloxane)  capillary  column.  The  oven  temperature  profile  was  50°C  for  one  minute, 
to  290°C  at  10°C/min,  holding  for  seven  minutes,  giving  a  run  time  of  32  minutes.  A  solvent 
delay  of  four  minutes  was  used  which  reduced  the  data  acquisition  time  to  28  minutes  per  run. 
The  GC-MS  data  that  were  acquired  from  these  runs  was  converted  from  the  native  HP 
Chemstation  datafiles  to  raw  text  format  utilizing  an  in-house  written  MS  Windows  program  and 
then  imported  into  MATLAB  for  subsequent  chemometric  analyses. 

Results  and  Discussion 


Identification  of  LCO  components  in  LCO/DFM  blends.  The  task  of  detecting  different  fuel 
types  in  a  fuel  blend  by  gas  chromatography  can  be  limited  by  the  overall  differences  in  GC 
features  from  the  different  component  fuels.  As  the  overall  differences  between  the  compositions 
of  the  component  fuels  in  a  mixture  become  smaller,  it  becomes  more  difficult  to  discriminate  the 
components  of  a  blend.  Thus,  the  greatest  limitations  in  this  approach  are  generally  encountered 
in  situations  where  a  fuel  may  be  contaminated  by  a  small  amount  of  a  slightly  different  fuel. 
Chemometric  analysis  of  GC  data  from  mixtures  can  be  successfully  used  to  discriminate 
between  different  fuel  types,  but  is  subject  to  limitations  imposed  by  the  depth  of  information 
provided  by  the  technique. 

By  extending  this  approach  to  a  second  order  technique,  each  fuel  component  is  more 
completely  defined  and  the  GC-MS  can  thus  provide  a  more  unique  numerical  representation  of 
the  fuels  that  comprise  the  sample.  A  significant  challenge  that  must  be  overcome  in  the 
treatment  of  second  order  data,  is  the  discrimination  of  statistically  significant  data  within  the 
complex  dataset.  By  employing  the  ANOVA  feature  selection  at  each  point  as  described  above, 
regions  of  the  GC-MS  data  can  be  defined  that  are  adept  at  describing  the  chemical  differences 
between  different  fuel  blends.  When  the  GC-MS  of  a  neat  DFM  sample  was  compared  to  a 
blend  of  the  same  fuel  adulterated  with  20%  by  volume  of  an  LCO,  the  ANOVA  feature  selection 
successfully  extracted  the  total  ion  chromatogram  (TIC)  of  the  LCO,  from  the  larger  quantities  of 
DFM  components,  as  shown  in  Figure  15. 
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The  individual  compounds  in  the  LCO  contaminant  can  be  also  identified  by  computing  a 
series  of  successive  PARAFAC  models  on  windowed  regions  of  the  retention  time  axis  of  the 
GC-MS  dataset.  PARAFAC  models  were  thus  constructed  on  successive  30  scan  windows  along 
the  retention  time  axis,  stepping  the  window  in  20  scans  increments,  at  each  iteration.  A  five 
factor  model  was  then  calculated  for  each  local  region  and  LCO  components  were  identified  by 
locating  factors  in  the  PARAFAC  model  where  the  sample  mode  loadings  varied  significantly 
between  the  neat  and  adulterated  DFM  samples.  The  sample  mode  loadings  are  shown  in  Figure 
1 6A,  which  depicts  the  total  ion  chromatograms  from  a  series  of  four  neat  DFM  samples  and  four 
DFM/LCO  samples.  The  loading  shown  with  the  dashed  line  shows  the  most  change  and 
represents  the  LCO.  The  retention  time  axis  loadings,  in  Figure  16B  depicts  the  same  three 
components,  with  the  dashed  line  showing  how  the  ion  chromatogram  of  just  the  LCO  was 
discriminated  from  the  other  blend  components. 

Comparison  of  this  extracted  ion  chromatogram  for  the  LCO  with  the  TIC,  illustrates  how 
this  approach  can  successfully  extract  the  chromatographic  data  for  a  single  component  from  a 
complex  mixture.  Accordingly,  as  shown  in  Figure  16C,  the  spectral  loadings  will  yield  the  mass 
spectrum  of  the  chemical  species  that  produced  the  extracted  LCO  ion  chromatogram  peak. 
Submitting  the  spectral  mode  loadings  of  those  factors  to  a  NIST  mass  spectral  library  matching 
algorithm  provides  a  list  of  compound  matches.  A  partial  list  of  components  identified  in  this 
manner  is  shown  on  Figure  17,  which  shows  a  portion  of  the  TIC  and  the  LCO  components. 
Ultimately,  we  would  export  these  procedures  to  a  stand-alone  computer  application  that  could 
start  with  raw  GC-MS  datafiles  and  provide  the  analyst  a  list  of  chemical  compounds  that  are 
different  between  the  two  fuels  or  blends  that  are  being  compared.  If  a  sufficient  training  set  of 
fuel  types  were  developed,  it  would  also  be  possible  to  adopt  this  process  to  provide  a  means  of 
performing  automated  assays  of  contaminated  fuel  samples  from  GC-MS  data. 


Scan  Number  (xIOOO) 

Figure  15.  Total  ion  current  chromatogram  of  a  DFM  fuel  sample  blended  20%  by  volume  with 
LCO  (gray)  overlaid  with  feature-selected  total  ion  current  chromatogram  (black). 
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Quantitative  determinations  of  blending  components.  Since  the  model  loadings  can  isolate 
the  individual  components  in  a  fuel  blend,  this  approach  can  also  be  used  to  derive  quantitative 
information  about  the  detected  components.  The  overall  LCO  contents  of  the  LCO  /  DFM 
blends  were  successfully  modeled  by  a  NPLS  regression  calibration.  As  was  stated  earlier,  NPLS 
differs  from  standard  PLS  regression  in  that  the  data  are  not  unfolded  into  a  2-dimensoinal 
matrix,  but  are  treated  as  a  3-dimensional  data  cube  and  decomposed  into  a  PARAFAC-like 
trilinear  model.  By  retaining  all  the  spatial  information,  much  like  PARAFAC,  a  significant 
improvement  is  obtained  in  the  predictive  power  of  the  model.  GC-MS  data  sets  were  first 
boxcar  averaged  by  eight  points  along  the  retention  time  axis,  to  reduce  data  size  and  increase  the 
speed  of  calculations.  The  predictive  power  of  the  resultant  model  was  cross-validated  by 
predicting  the  concentration  of  each  blend  with  a  calibration  model  constructed  from  a  dataset 
consisting  of  the  five  GC-MS  replicate  analyses  of  each  of  the  remaining  samples.  The  results  of 
these  predictions  are  shown  graphically  in  Figure  18,  as  the  predicted  vs  actual  volume  percent  of 
LCO.  The  overall  variance  of  prediction  represented  by  this  graph  is  14.4,  although  this  number 
drops  to  0.5  if  the  samples  at  either  end  of  the  LCO  concentration  range  are  excluded.  This  is  not 
particularly  surprising,  as  the  calibration  models  used  to  predict  both  of  these  concentrations  are 
extrapolated.  In  this  manner,  a  quantitative  determination  of  the  amount  of  LCO  in  the  DFM 
samples  was  obtained  from  the  GC-MS  analysis. 

Qualitative  recognition  of  fuels  in  a  blend.  An  obvious  application  for  GC-MS  modeling  of 
fuel  mixtures  would  be  the  qualitative  determination  of  the  components  that  are  present.  A  multi¬ 
way  principal  component  analysis  (MPCA)  model  was  constructed  using  GC-MS  chromatograms 
from  replicate  analyses  of  three  different  fuel  samples,  a  neat  DFM,  a  separate  DFM  sample 
known  to  be  adulterated  with  an  unknown  quantity  of  home  heating  oil  (HHO),  and  a  mixture  of 
equal  volumes  of  the  two.  Scores  from  this  model  are  shown  in  Figure  19,  and  indicate  a  clear 
delineation  between  the  samples.  An  examination  of  the  loadings  on  component  1  indicates  the 
presence  of  HHO  components  from  the  positive  loading,  and  those  more  characteristic  of  the 
DFM  fuel  were  successfully  modeled  by  the  negative  loading.  The  multi-way-PCA  plots  of  the 
entire  GC-MS  datasets  from  these  DFM  and  adulterated  samples  are  depicted  in  Figure  20,  from 
the  loadings  on  the  first  principal  component.  By  depicting  these  loadings  as  projections  on  the 
scan  number  vs  m/z  axes,  it  is  evident  that  the  GC-MS  data  for  the  DFM  /  HHO  blends  were 
successfully  isolated,  as  positive  loadings  in  Figure  20A,  and  the  DFM  itself  in  Figure  20B  from 
the  negative  loadings  of  the  model.  This  illustrates  the  degree  to  which  the  MPCA  models  can 
distinguish  between  these  two  very  similar  fuels,  on  the  basis  of  their  entire  composition,  rather 
than  just  relying  on  selected  targeted  constituents  to  infer  the  presence  of  the  two  fuel  types. 


Summary  of  Results 

Significant  advantages  in  sensitivity  and  selectivity  can  be  realized  when  analyzing  mixtures 
of  fuels,  by  using  multidimensional  data.  The  ANOVA  feature  selection  algorithm  is  the  key  to 
the  utilization  of  complex  datasets,  such  as  the  chromatographically  resolved  data  obtained  from  a 
GC-MS  analysis.  A  significant  advantage  of  using  GC-MS  data  over  first-order  spectrographic 
single-channel  gas  chromatographic  measurements  is  the  increased  level  of  detail  provided  by  the 
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Figure  16.  PARAFAC  analysis  of  a  local  region  of  GC-MS  data  to  deconvolve  chemical 
components.  (A)  Sample  mode  loadings  (B)  Retention  time  axis  mode  loadings,  with  overlay  of 
TIC  in  bold  and  (C)  deconvolved  mass  spectrum  of  PARAFAC  model  component  that  changes 
between  samples. 
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Figure  17.  LCO  components  identified  in  feature-selected  TIC  via  NIST  library  matching  of 
PARAFAC-deconvolved  spectra. 
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Figure  18.  NPLS  calibration  of  LCO  content  in  DFM/LCO  blends.  Variance  of  prediction  was 
14.39,  but  reduced  to  0.50  if  results  from  prediction  of  concentration  extremes  (neat  LCO  and 
DFM  samples)  are  omitted. 
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Figure  19.  Scores  on  the  first  principal  component  from  an  MPCA  model,  showing  the 
discrimination  of  three  blends  of  Naval  Distillate  (DFM)  with  home  heating  oil  (HHO). 

analytical  data.  Discrimination  of  fuel  types  in  mixtures  is  a  common  need  in  remediation  of  fuel 
spills  and  the  examination  of  fuel  contamination  issues.  Available  methods  for  performing  these 
types  of  discrimination  analyses  generally  rely  on  either  spectroscopic  or  chromatographic 
analyses.  However,  these  methods  are  limited  when  the  compositions  of  the  components  of  a 
mixture  overlap  or  are  very  similar.  In  this  study,  we  show  that  many  of  these  limitations  can  be 
overcome  by  including  the  additional  dimension  of  compositional  information  provided  by  GC- 
MS.  The  novel  multidimensional  chemometric  techniques  described  above  were  successfully 
used  to  analyze  GC-MS  data  and  unravel  multi-component  properties  of  fuel  blends  into  its 
component  parts.  The  sensitivity  and  accuracy  of  this  approach  is  illustrated  by  successful 
quantitative  and  qualitative  discrimination  of  the  components  of  diesel  fuel  (NATO  F-76) 
samples  contaminated  with  light  cycle  oil  and  with  another  similar  diesel  fuel. 

This  approach  offers  the  means  to  more  effectively  characterize  fuel  blends  and 
contaminants,  by  comparison  with  the  known  fuel.  If  the  known  fuel  components  are  not 
available,  an  estimated  discrimination  can  still  be  made  by  comparison  with  typical  specification 
fuels.  This  could  potentially  provide  the  capability  to  detect  and  characterize  unknown  fuel 
adulterants,  by  comparison  with  known  samples  of  the  unadulterated  base  fuel. 

The  predictive  power  of  chemometric  modeling  of  GC-MS  data  is,  of  course,  limited  to  those 
fuel  constituents  that  can  be  chromatographic  ally  separated  and  detected.  However,  the 
multidimensional  chemometric  modeling  techniques  described  in  this  work  would  not  be  limited 
to  GC-MS  and  could  form  the  basis  for  the  development  of  optically  based  sensing  systems  to 
monitor  fuel  cleanliness  and  contamination. 
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Figure  20.  MPCA  model  of  DFM/HHO  blends.  The  two  components  in  the  mixture  are  clearly 
shown  from  the  loadings  on  the  first  principal  component.  (A)  positive  loadings  represent  the 
home  heating  oil  and  ,  (B)  negative  loadings  which  represent  the  diesel  fuel. 
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4.0  EVALUATING  THE  PREDICATIVE  POWERS  OF  SPECTROSCOPY  AND 
CHROMATOGRAPHY  FOR  FUEL  QUALITY  ASSESSMENT 


Currently,  fuel  quality  in  the  field  or  onboard  ship  is  assessed  with  a  series  of  traditional 
ASTM  fuel  test  procedures.  A  sensor-based  device  to  perform  these  tests  would  not  only  provide 
significant  savings  in  cost  and  manpower,  but  reduce  the  hazards  associated  with  handling  large 
volumes  of  fuel  samples.  It  would  also  provide  faster,  and  in  many  cases,  more  consistent  results. 
This  phase  of  the  program  was  thus  focused  on  developing  sensing  technologies  to  perform  fuel 
quality  assessment  and  diagnostics.  This  technology  is  based  upon  the  prediction  of  critical  fuel 
properties  from  an  array  of  optical  and  other  specialized  sensors.  These  predictions  will  take 
advantage  of  predictive  models  derived  from  chemometric  analysis  of  the  data  stream. 

The  predictive  power  of  chemometric  regression  models  based  on  chromatographic  data  are 
compared  with  those  generated  from  NIR  and  Raman  spectroscopic  data  to  evaluate  the  potential 
of  these  analytical  techniques  for  the  development  of  an  advanced  fuel  quality  sensor  system.  A 
training  set  of  fuel  samples  from  around  the  world  was  acquired,  with  complete  compositional 
and  specification  test  results.  These  samples  were  analyzed  by  NIR,  Raman  and  GC-MS.  These 
data  were  evaluated  for  their  ability  to  predict  various  fuel  properties  via  PLS  as  part  of  an  effort 
directed  towards  developing  robust,  sensor-based  fuel  quality  assessment  methodologies. 


Experimental 

Fuel  sample  set.  A  set  of  45  jet  fuels  that  were  sampled  from  around  the  world,  were  used  in 
this  initial  survey.  The  set  consisted  of  Jet  A  (11  samples),  Jet  A-l  (22  samples),  JP-8  (9 
samples),  JP-5  (2  samples)  and  a  petroleum  (Stoddard)  solvent  (1  sample).  The  samples  were 
supplied  with  measured  values  for  28  fuel  specification  properties,  and  all  fuel  samples  met  the 
appropriate  specifications.  The  range  of  property  values  reported  for  this  fuel  set  are  given  in 
Table  5. 

NIR  Spectroscopy.  Near-infrared  spectra  were  obtained  with  a  Cary  model  5E 
spectrophotometer.  Supracell  cells  with  path  lengths  of  1 0  and  1  mm  were  used.  Initial  spectra 
were  obtained  from  300  to  2300  nm,  with  a  resolution  of  1  nm.  For  chemometric  analysis,  the 
spectral  region  from  1000  to  2300  nm  was  used.  Repeatability  of  spectra  was  excellent,  with  the 
exception  of  some  baseline  variations  observed  when  the  1  mm  sample  cell  was  used,  presumably 
due  to  slight  differences  in  positioning  the  cell  in  the  beam.  These  baseline  variations  with  the  1 
mm  cell  data  were  corrected  using  the  multiplicative  scatter  correction  function  in  the  PLS 
Toolbox  for  MATLAB  prior  to  multivariate  analysis.  Preliminary  comparisons  between  the  1 
mm  and  10  mm  data  indicated  no  significant  advantage  to  the  1  mm  cell,  and  so  its  use  was 
discontinued.  Data  were  collected  with  the  Cary  software  provided  with  the  instrument,  and 
exported  in  comma  separated  value  (CSV)  format.  The  resultant  numerical  representations  of  the 
spectra  were  combined  in  one  array. 

Raman  Spectroscopy.  Raman  spectra  for  a  30  fuel  subset  of  the  total  sample  set  were 
provided  by  Real-Time  Analyzers,  Inc.  (Middletown,  CT),  and  were  acquired  with  a  portable 
scanning  FT-Raman  spectrometer  of  their  manufacture.  Spectra  were  acquired  from  500  to  3500 
cm'1  with  a  resolution  of  1  cm1. 

GC-MS  Analysis.  Samples  for  GC-MS  analysis  were  prepared  by  diluting  2  pL  of  each 
sample  with  2  mL  dichloromethane.  An  autosampler  injected  1.0  pL  aliquots  of  each  of  five 
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replicate  samples  in  random  order  to  an  Agilent  model  5890  capillary  gas  chromatograph  coupled 
to  a  HP  5971  mass  selective  detector.  A  split/splitless  injector  at  250°C  with  a  split  flow  ratio  of 
60:1  was  used  along  with  a  50  m  x  0.2  mm  Agilent  HP-1  (dimethylpolysiloxane)  capillary 
column.  The  oven  temperature  profile  was  50°C  for  one  minute,  to  290°C  at  10°C/min,  holding 
for  seven  minutes,  giving  a  run  time  of  32  minutes.  A  solvent  delay  of  four  minutes  was  used 
which  reduced  the  data  acquisition  time  to  28  minutes  per  run.  Masses  were  scanned  from  m/z  of 
40  to  240.  The  GC-MS  data  that  were  acquired  from  these  runs  was  converted  from  the  native 
HP  Chemstation  data  files  to  raw  text  format  utilizing  an  in-house  written  MS  Windows  program. 
The  chromatograms  were  then  aligned  to  one  another  to  minimize  retention  time  variations  from 
sample  to  sample  via  a  standalone  MS  Windows  program  implementing  the  correlation  optimized 
waiping  algorithm  described  by  Vest  Nielsen.  (available  for  download  at 
http://www.  biocentrum,  dtu.  dk/mvcolog\Vanalvsis/cow/) 

Chemometric  Regression.  PLS  and  principal  components  regression  (PCR)  was  performed 
utilizing  the  NIR  spectra,  Raman  spectra,  GC  total  ion  current  chromatograms,  and  unfolded  GC- 
MS  chromatograms  against  the  28  measured  fuel  properties.  Both  PLS  and  PCR  are  inverse  least 
squares  regression  models  that  use  factor  analysis  to  reduce  the  spectral  or  chromatographic  data 
prior  to  regression.33  PCR  projects  the  input  data  onto  a  lower  dimensional  subspace  calculated 
to  most  efficiently  represent  the  sample-to-  sample  variation  contained  within  the  calibration  data, 
whereas  PLS  projects  the  input  data  onto  a  lower  dimensional  subspace  calculated  to  best 
represent  the  covariance  between  the  calibration  data  and  corresponding  reference  values.  These 
two  regression  techniques  were  chosen  for  consideration  as  they  are  well  established,  well 
characterized,  and  widely  implemented  in  various  software  packages.  Additionally,  multiway 
partial  least  squares  regression  was  used  to  regress  GC-MS  datasets  against  the  provided  fuel 
properties  for  each  of  the  45  fuels  in  the  sample  set. 

NIR  and  Raman  spectra  were  assembled  into  matrices  in  which  each  row  was  a  spectrum  of  a 
different  fuel  sample.  The  acquired  dataset  for  GC-MS  analysis  consisted  of  a  series  of  two- 
dimensional  GC-MS  chromatograms,  one  for  each  sample  analyzed,  stacked  on  each  other  to 
form  a  three-dimensional  array,  or  cube  of  data.  Total  ion  current  (TIC)  chromatograms  were 
constructed  by  summing  each  GC-MS  dataset  along  the  m/z  axis.  “Unfolded”  GC-MS 
chromatograms  were  created  by  reshaping  the  data  matrix  for  each  GC-MS  chromatogram  into  a 
single  row  vector.  Prior  to  unfolding,  each  GC-MS  chromatogram  was  boxcar  averaged  with  a 
window  of  five  points. 

PLS  and  PCR  algorithms  were  implemented  utilizing  the  PLS  Toolbox.  Calibration  models 
were  evaluated  utilizing  “leave-one-out”  cross  validation  in  which  the  property  value  of  each 
sample  is  predicted  utilizing  a  calibration  model  built  from  all  of  the  other  data.  Regression 
models  were  built  utilizing  from  one  to  ten  latent  variables  (or  components)  and  the  model  with 
the  lowest  root  mean  square  error  of  cross  validation  (RMSECV)  was  chosen  for  inclusion  into 
the  results.  A  limit  of  10  latent  variables  was  imposed  to  guard  against  over-fitting  the  data  with 
excessively  complex  models,  and  provided  us  with  a  maximum  ratio  of  roughly  five  fuel  samples 
per  latent  variable  a  regression  model.  For  purposes  of  comparison  between  models  of  different 
properties,  RMSECV  values  were  normalized  by  the  mean  observed  for  the  fuel  property  they 
were  predicting. 


33  Richard  Kramer,  “Chemometric  Techniques  for  Quantitative  Analysis,”  Marcel  Dekker,  New 
York,  1998. 
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ASTM 

Property  Name 

min 

max 

range 

mean 

std 

D4052 

density  at  60  °F,  g/ml 

0.78 

0.82 

0.04 

0.80 

0.01 

D93 

flash  point  (P-M),  °F 

105.00 

154.00 

49.00 

119.69 

11.94 

D3828 

flash  point  (mini),  °F 

103.00 

144.00 

41.00 

120.11 

10.35 

D5972 

freeze  point,  °C 

-72.00 

-44.00 

28.00 

-52.31 

5.62 

D5949 

pour  point,  °C 

-80.00 

-60.00 

20.00 

-69.20 

2.84 

D2622 

total  sulfur,  ppm 

7.00 

2453 

24460 

417.64 

432.21 

D1840 

naphthalenes,  vol  % 

0.00 

3.80 

3.80 

1.30 

0.76 

D1319 

aromatics,  vol  % 

11.80 

22.00 

10.20 

17.88 

2.13 

D6379 

aromatics,  vol  % 

13.00 

24.40 

11.40 

19.57 

2.40 

D1319 

saturates,  vol  % 

75.80 

87.00 

11.20 

80.50 

2.13 

D1159 

olefins,  vol  % 

0.06 

1.53 

1.47 

0.35 

0.25 

D1319 

olefins,  vol  % 

0.70 

2.30 

1.60 

1.62 

0.35 

D3701 

hydrogen,  weight  % 

13.71 

14.47 

0.76 

14.15 

0.19 

D4809 

net  heat  content,  btu/lb 

18331 

18589 

258 

18506 

52.08 

D445 

viscosity  20  °C,  mm2/sec 

1.30 

3.00 

1.70 

1.78 

0.30 

D445 

viscosity  -20  °C,  mm2/sec 

2.70 

6.20 

3.50 

4.18 

0.81 

D445 

viscosity  -40  °C,  mm2/sec 

4.80 

14.60 

9.80 

8.59 

2.38 

D1218 

refractive  index 

1.44 

1.46 

0.02 

1.45 

0.00 

D2624 

conductivity,  pS/m 

0.00 

395.00 

395.00 

93.11 

101.11 

D3242 

acid  number,  mg  KOH/g 

0.0000 

0.0200 

0.0200 

0.0055 

0.0035 

D3241 

thermal  stability,  °F 

265.00 

370.00 

105.00 

286.89 

16.90 

D5001 

lubricity,  mg/L 

0.54 

0.71 

0.17 

0.62 

0.04 

D86 

initial  boiling  point,  °F 

294.10 

362.80 

68.70 

318.17 

15.83 

D86 

10%  distillation,  °F 

329.10 

388.60 

59.50 

347.66 

15.79 

D86 

20%  distillation,  °F 

335.20 

400.00 

64.80 

358.81 

16.52 

D86 

50%  distillation,  °F 

346.40 

439.00 

92.60 

391.34 

20.45 

D86 

90%  distillation,  °F 

372.40 

488.30 

115.90 

454.68 

22.65 

D86 

final  boiling  point,  °F 

386.80 

521.70 

134.90 

481.01 

25.00 

Table  5.  Summary  of  fuel  property  data  ranges  and  standard  deviations  for  each  tested  property. 


Six  preprocessing  strategies  were  examined  for  each  dataset.  For  spectroscopic  data,  these 
strategies  were  as  follows:  1)  no  preprocessing,  2)  second  derivative,  3)  autoscaling,  4)  mean¬ 
centering,  5)  second  derivative  followed  by  autoscaling,  and  6)  second  derivative  followed  by 
mean-centering.  Second  derivative  transformation  was  implemented  through  the  Savitsky-Golay 
filter  algorithm  in  the  PLS  Toolbox.  For  chromatographic  data,  the  preprocessing  strategies 
tested  were  as  follows:  1)  no  preprocessing,  2)  normalization,  3)  autoscaling,  4)  mean-centering, 
5)  normalization  followed  by  autoscaling,  and  6)  normalization  followed  by  mean-centering. 
Normalization  was  implemented  by  dividing  each  individual  chromatogram  by  its  Euclidean 
norm,  and  was  implemented  to  minimize  any  injection  volume  variation  from  run  to  run.  Thus, 
for  each  preprocessing  scheme,  and  each  regression  technique,  an  optimized  regression  model 
using  up  to  10  latent  variables  was  calculated  and  the  RMSECV  of  that  model  was  recorded. 

Additionally,  N-PLS  was  used  to  build  regression  models  to  predict  fuel  properties  with 
entire  GC-MS  chromatograms.  GC-MS  data  was  boxcar  averaged  along  the  retention  time  axis 
with  a  window  of  ten  points  and  the  mass  spectral  axis  of  the  data  was  truncated  to  the  first  1 00 
masses  acquired  (m/z  of  40  to  139)  in  order  to  speed  calculations. 
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Results  and  Discussion 


The  fuel  sample  densities  predicted  from  PLS  regression  of  the  Raman  spectra  are  plotted 
against  the  measured  values  in  Figure  21.  Reasonably  good  agreement  was  obtained  as  shown 
when  the  data  were  mean  centered  and  regressed  with  ten  latent  variables.  The  predicted  vs 
aromatic  contents  of  the  fuel  samples  from  PLS  regression  of  NIR  spectra  are  shown  in  Figures 
22  and  23,  for  measurements  obtained  by  the  HPLC  method34,  ASTM  D6379,  and  the  fluorescent 
indicator  absorption  (FIA)  method35,  ASTM  D1319,  respectively.  The  PLS  regressions  were 
performed  on  the  mean  centered  data,  using  seven  latent  variables.  As  shown,  good  agreement 
was  obtained  between  the  predicted  and  measured  values  for  aromatic  content  by  HPLC  and  by 
the  FIA  method.  However,  when  the  HPLC  measurements  are  plotted  against  the  corresponding 
FIA  measurements  in  Figure  24,  it  is  evident  that  the  HPLC  values  were  systematically  higher. 
This  illustrates  the  ability  of  PLS  correlation  modeling  to  derive  reasonably  accurate  predictions 
from  the  same  set  of  spectral  data  for  two  different  measurements  of  a  single  property,  even  if  the 
results  of  those  two  techniques  are  not  in  complete  agreement  with  each  other.  Clearly,  both  the 
HPLC  and  the  FIA  methods  are  self-consistent,  but  it  is  important  to  specify  which  ASTM 
method  is  being  used,  since  the  models  derived  in  this  manner  are  only  representative  of  the  data 
used  in  the  training  set. 

Assessment  of  prediction  errors.  The  error  of  prediction  of  a  chemometric  regression  model 
is  a  function  of  the  uncertainty  in  the  original  ASTM  reference  values  as  well  as  of  the  error 
associated  with  the  analytical  technique  used  to  acquire  spectroscopic  or  chromatographic  data 
utilized  for  model  building.  One  would  expect  that  the  error  in  prediction  of  a  good  regression 
model  would  be  comparable  in  magnitude  to  the  uncertainty  of  the  reference  values  used  to 
construct  it.  Accordingly,  the  chemometric  regression  models  were  first  evaluated  to  see  how 
their  rates  of  error  compared  to  the  reproducibility  and  repeatability  values  of  the  respective 
ASTM  methods  that  were  used  to  acquire  the  reference  measurements.  Reproducibility  and 
repeatability  values  were  calculated  using  the  formulas  published  in  the  method  specifications 
and  the  mean  property  values  observed  across  the  entire  set  of  fuel  samples.  These  values  are 
compiled  in  Table  6,  along  with  the  root  mean  errors  of  cross  validation  for  the  chemometric 
regression  models  constructed.  The  property  normalized  RMSECV  values  that  were  10%  or  less 
of  the  measured  values  for  the  PLS  and  PCR  models  are  plotted,  along  with  the  published  ASTM 
repeatability  values  normalized  by  the  mean  measured  values  for  each  property  in  Figure  25.  An 
examination  of  Table  6  shows  that  the  PLS  model  RMSECV  values  do,  in  fact,  tend  to  be  roughly 
similar  in  magnitude  to  the  ASTM  method  values,  with  a  few  notable  exceptions.  The 
corresponding  PCR  model  RMSECV  values  were  similar.  Predictions  made  for  sulfur  content 
and  conductivity  exhibited  much  greater  error  than  would  be  expected  from  the  ASTM  method 
uncertainty  alone.  This  is  unsurprising,  as  none  of  the  analytical  techniques  examined  directly 
probe  these  fuel  properties.  Also  notable  is  the  fact  that  both  of  the  flash  point  measurement 
models  exhibit  significantly  smaller  error  than  the  ASTM  method  uncertainty  would  imply.  The 
reasons  for  this  are  unclear,  but  could  be  the  result  of  overfitting  in  the  regression  model,  higher 


34  ASTM.  Standard  Test  Method  for  Determination  of  Aromatic  Hydrocarbon  Types  in  Aviation 
Fuels  and  Petroleum  Distillates  -  High  Performance  Liquid  Chromatography  Method  with 
Refractive  Index  Detection.  In  Annual  Book  of  ASTM  Standards ;  ASTM:  Philadelphia,  PA, 
2002;  Vol.  05.04,  ASTM  D6379-04. 

35  ASTM,  Standard  Test  Method  for  Hydrocarbon  Types  in  Liquid  Petroleum  Products  by 
Fluorescent  Indicator  Adsorption.  In  Annual  Book  of  ASTM  Standards',  ASTM:  Philadelphia, 
PA,  2002;  Vol.  05.04,  ASTM  D13 19-03. 
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than  normal  precision  from  the  analytical  lab  that  performed  the  measurements,  or  some 
combination  of  factors. 

Properties  amenable  to  predictive  modeling.  Table  7  contains  RMSECV  values  for  PLS  and 
PCR  model  predictions  and  predictions,  normalized  to  mean  property  value  for  the  sake  of 
comparison  across  different  properties.  The  worst  performing  models  are  those  for  olefins 
(ASTM  D1319)  naphthalenes,  acid  number,  total  sulfur,  olefins  (ASTM  D1159/D27),  and 
conductivity,  all  giving  RMSEC  V  values  that  were  greater  than  20%  of  the  mean  property  values 
being  predicted  (Table  5).  A  number  of  models,  however,  gave  RMSECV  values  that  were  less 
than  ten  percent  of  the  mean  property  value,  indicating  potential  for  inclusion  into  the  proposed 
automated  fuel  assessment  methodology.  Among  these  properties  were  those  of  interest  in  fuel 
quality  assessment,  shown  in  Table  1:  density,  freeze  point,  pour  point,  flash  point,  aromatic 
content,  and  viscosity. 

PLS  vs  PCR  modeling  efficiencies.  From  Table  7,  one  can  see  that  there  seems  to  be  little 
difference  between  the  results  obtained  from  each  regression  method.  The  ratios  of  PLS/PCR 
RMSECV  values,  shown  in  Table  8  clearly  illustrate  that  PLS  and  PCR  models  perform  with 
comparable  accuracy,  although  when  PLS  models  are  better,  they  tend  to  be  better  by  a  slightly 
larger  margin  than  that  of  PCR  models  that  are  better  their  corresponding  PLS  model.  As  the 
number  of  latent  variables  required  to  adequately  model  a  particular  property  increases,  so  does 
the  complexity  of  the  resultant  model  and  its  specificity  for  the  particular  fuels  that  constitute  the 
training  set.  As  a  consequence,  models  with  high  numbers  of  latent  variables  tend  to  fail  when 
applied  to  fuels  that  are  not  similar  to  those  used  to  develop  that  particular  model.  Models 
developed  with  high  numbers  of  latent  variables  are  generally  referred  to  as  over-fitting  the  data. 
It  is  therefore  desirable  to  construct  models  that  adequately  relate  to  the  property  of  interest  with 
as  few  latent  variables  as  possible,  trading  off  precision  for  robustness.  Table  9  lists  the  optimum 
number  of  latent  variables  required  for  each  property  model  shown.  An  examination  of  this  Table 
shows  that  in  general,  the  model  that  performed  best  for  each  property,  also  required  the  most 
latent  variables,  and  thus  the  greatest  model  complexity,  regardless  of  regression  method.  It  is 
apparent,  however,  that  PCR  models  requiring  high  numbers  of  latent  variables  return  slightly 
less  improvement  in  terms  of  RMSECV  versus  cases  where  PLS  models  require  high  numbers  of 
latent  variables.  For  example,  the  PCR  model  for  flash  point  (D93)  requires  six  extra  latent 
variables  to  obtain  an  RMSECV  that  is  4%  lower  than  the  3  latent  variable  PLS  model. 
Conversely,  the  PLS  model  for  pour  point  requires  only  3  extra  latent  variables  to  obtain  an 
RMSECV  that  is  7%  lower  than  the  seven  latent  variable  PCR  model. 

Comparison  of  analytical  techniques.  Next  to  be  considered  is  the  question  of  which 
analytical  technique  provides  the  best  data  for  chemometric  prediction  of  fuel  properties.  Table 
1 0  lists  the  optimum  RMSECV  for  each  of  the  three  analytical  techniques  examined.  RMSECV 
values  obtained  from  calibrations  using  GC  or  spectroscopic  data  were  essentially  comparable, 
except  in  the  prediction  of  saturate  content,  aromatic  content,  and  freeze  point,  where  NIR 
performed  substantially  better.  In  fact,  from  Table  10  it  can  be  seen  that  calibrations  constructed 
with  NIR  or  Raman  data  generally  performed  slightly  better  than,  or  comparably  to  those 
constructed  with  GC  data,  with  exceptions  for  freezing  point,  distillation  temperatures,  and 
viscosity  where  calibrations  using  GC  data  performed  slightly  better. 
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Density  by  ASTM  D4052 

Figure  21.  Density  of  the  jet  fuel  sample  set  predicted  by  PLS  regression  of  Raman  spectra  using 
ten  latent  variables,  versus  density  measured  by  ASTM  D4052 


Figure  22.  Jet  fuel  sample  aromatic  content  predicted  by  PLS  regression  of  NIR  spectra  using 
seven  latent  variables,  versus  aromatic  content  measured  by  ASTM  D6379  (HPLC  method). 
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Figure  23.  Jet  fuel  sample  aromatic  content  predicted  by  PLS  regression  of  NIR  spectra  using 
seven  latent  variables,  versus  aromatic  content  measured  by  ASTM  D1319  (FIA  method). 


Figure  24.  Aromatic  content  in  the  jet  fuel  sample  set,  measured  by  ASTM  D6379  vs.  ASTM 
D1319. 


39 


Regressing  fuel  properties  against  unfolded  GC-MS  data  led  to  some  slight  improvements  in 
prediction  of  freeze  point  and  pour  point,  but  otherwise  performed  somewhere  in  between 
regression  models  utilizing  total  ion  current  chromatograms  and  those  using  NIR  spectra.  N-PLS 
regression  models  utilizing  averaged  GC-MS  data  sets  performed  in  a  comparable  fashion  or 
worse  than  other  regression  models.  In  this  study,  no  advantage  was  seen  in  utilizing  N-PLS 
regression  over  standard  PLS  on  unfolded  data,  especially  in  models  with  the  lowest  RMSECV 
values,  such  as  density  and  refractive  index,  where  N-PLS  gave  RMSECV  values  5  and  16  times 
greater  than  those  constructed  with  unfolded  GC-MS  data. 

Data  preprocessing  methods.  The  effects  of  the  preprocessing  algorithms  considered  in  this 
study  were  inconsistent  from  property  to  property,  decreasing  the  RMSECV  in  many  cases  but 
raising  it  in  others.  For  example,  mean  centering  lowered  RMSECV  values  in  PLS  calibrations 
for  flash  point,  lubricity,  and  viscosity  at  -40  °C  while  slightly  raising  RMSECV  in  PLS 
calibrations  for  saturate  and  aromatic  content.  It  is  likely  that  this  variability  may  be  reduced 
when  a  larger  training  set  of  fuels  is  considered.  In  general,  however,  results  from  mean 
centering  provided  either  the  best  RMSECV,  or  an  RMSECV  that  was  not  more  than  10%  greater 
than  the  processing  scheme  that  did.  In  all  cases,  applying  a  second  derivative  transformation  to 
the  spectroscopic  data  resulted  in  a  significant  increase  in  RMSECV.  Autoscaling  provided 
results  that  were  comparable  or  slightly  worse  than  mean  centering  in  all  but  a  few  cases. 


Summary  of  Results 

Feasibility  of  chemometric  prediction  of  a  number  of  critical  fuel  properties  from  measured 
capillary  gas  chromatograms,  NIR  spectra,  and  Raman  spectra  has  been  demonstrated.  Root 
mean  errors  of  cross  validation  of  predicative  models  for  critical  properties  of  interest  were  less 
than  1 0%  of  observed  mean  property  values,  and  commensurate  with  uncertainties  expected  from 
the  ASTM  methods  utilized  to  acquire  reference  values  for  this  study.  The  findings  of  this  study 
do  not  indicate  that  there  were  any  statistically  significant  differences  between  NIR  and  Raman 
spectroscopy  in  terms  of  regression  model  performance.  Thus,  a  system  based  solely  on  either 
NIR  or  Raman  spectral  data  should  perform  equally  well,  although  a  system  based  on  both  GC 
and  spectral  instruments  could  provide  some  improvement,  particularly  in  the  prediction  of  freeze 
point  and  viscosity. 

Predictions  based  on  unfolded  GC-MS  chromatograms  and  N-PLS  regression  of  whole  GC- 
MS  chromatograms  failed  to  yield  significant  improvement  over  predictions  based  on  calibration 
models  utilizing  total  ion  current  chromatograms,  or  capillary  gas  chromatography.  A  sensor- 
based  fuel  quality  assessment  system  could  in  theory  be  augmented  with  specialized  sensors  to 
allow  evaluation  of  properties  that  are  not  amenable  to  chemometric  prediction  from 
chromatography  or  spectroscopy  alone,  such  as  conductivity,  total  sulfur,  and  acid  number. 
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ASTM 

Property 

ASTM  Method 
Errors 

Reprod.  Repeat. 

PLS  Minimum  RMSECV 

NIR  Raman  GC 

D4052 

Density 

0.0005 

0.0001 

0.0026 

0.0018 

0.0031 

D93 

Flash  Pt.  (PM) 

32.0 

33.3 

OO 

OO 

8.9 

7.1 

D3828 

Flash  Pt.  (mini) 

37.0 

33.3 

7.8 

7.5 

7.4 

D5972 

Freeze  Pt. 

1.3 

0.7 

3.6 

4.2 

3.0 

D5949 

Pour  Pt. 

6.8 

3.4 

2.7 

2.2 

2.1 

D2622 

Sulfur 

30 

21 

283 

233 

419 

D1840 

Naphthalenes 

0.069 

0.051 

0.47 

0.41 

0.55 

D1319 

Arom.,  FIA 

2.70 

1.30 

0.66 

0.92 

1.32 

D6379 

Arom.,  HPFC 

1.897 

0.938 

0.64 

1.09 

1.48 

D1319 

Saturates 

4.40 

1.40 

0.75 

1.20 

1.37 

D1159 

Olefins 

0.40 

0.20 

0.21 

0.34 

0.22 

D1319 

Olefins,  FIA 

2.10 

0.60 

0.34 

0.48 

0.31 

D3701 

Hydrogen 

0.11 

0.09 

0.11 

0.20 

0.12 

D4809 

Heat  Content 

77.4 

22.9 

27.0 

48.0 

40.4 

D445 

Vise.  20°C 

- 

- 

0.25 

0.30 

0.24 

D445 

Visc.-20°C 

- 

- 

0.47 

0.47 

0.39 

D445 

Visc.-40°C 

- 

- 

1.16 

1.00 

0.86 

D1218 

Ref.  Index 

0.0005 

0.0002 

0.0012 

0.0019 

0.0013 

D2624 

Conductivity 

17 

5 

77 

92 

87 

D3242 

TAN 

0.0030 

0.0010 

0.0034 

0.0030 

0.0034 

D3241 

Thermal  Stab. 

- 

- 

46 

58 

47 

D5001 

Fubricity 

0.070 

0.046 

0.037 

0.038 

0.033 

D86 

IBP 

15.3 

6.3 

11.0 

11.4 

9.6 

D86 

10% 

10.8 

5.1 

9.2 

7.8 

5.7 

D86 

20% 

13.9 

5.3 

8.5 

6.9 

5.1 

D86 

50% 

24.2 

9.0 

8.9 

15.2 

10.5 

D86 

90% 

11.6 

5.4 

14.0 

13.3 

10.9 

Table  6.  Comparison  of  Regression  model  root  mean  error  of  cross-validation  (RMSECV) 
calculated  from  mean  observed  value  for  each  property  over  all  samples,  with  the  published 
ASTM  method  reproducibility  and  repeatability. 
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Figure  25.  RMSECV  values  for  PLS  and  PCR  models,  compared  to  the  corresponding  ASTM 
method  repeatability  for  each  property,  normalized  by  the  mean  property  values  measured  for  this 
fuel  dataset. 


5.0  CONCLUSIONS 

5.1  Chemometric  Analysis  of  Chromatographic  Data 

The  findings  of  this  study  have  shown  that  chemometric  analysis  can  be  successfully 
employed  to  establish  a  relationship  between  the  composition  of  a  particular  fuel  to  its  suitability 
for  use.  With  proper  preprocessing,  the  tendency  of  jet  fuels  from  one  particular  refinery  to  cause 
catastrophic  engine  failures  through  excessive  coking,  were  successfully  classified  on  the  basis  of 
data  obtained  through  gas  chromatography  with  flame  ionization  detection.  Although  this 
approach  proved  successful  in  the  scenario  examined,  it  is  likely  that  the  accuracy  of  a 
classification  model  developed  for  one  set  of  fuels  will  suffer  when  applied  to  a  new  set  of  fuels 
obtained  some  time  later.  This  is  due  to  normal  variations  in  fuel  feed  stocks  and  refinery 
operating  parameters  that  are  difficult  to  model  explicitly.  Thus,  in  addition  to  proper  instrument 
maintenance  and  data  pretreatment,  classification  model  maintenance  and  calibration  against 
updated  fuel  sample  sets  are  required  to  maintain  high  accuracy  of  such  a  classifier. 
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Property 

RMSECV 

PLS 

RMSECV 

PCR 

Refractive  index 

0.00085 

0.000941 

Sp.  heat  cap.  at  0  °C 

0.00146 

0.00172 

Density  at  60  °F 

0.00219 

0.00327 

Hydrogen 

0.00765 

0.00735 

Saturates 

0.00929 

0.0103 

Dist  10% 

0.0164 

0.0174 

Dist  50% 

0.0226 

0.0244 

Dist  FBP 

0.0291 

0.0289 

Pour  Point 

0.0297 

0.0321 

Dist  IBP 

0.0302 

0.0304 

Aromatics,  D6379 

0.0329 

0.0357 

Aromatics,  D 1 3 1 9 

0.0366 

0.0429 

Lubricity  scar 

0.0540 

0.0624 

Freezing  point 

0.0570 

0.0559 

Flash  point,  D93 

0.0597 

0.0575 

Flash  point,  D3828 

0.0615 

0.0601 

Viscosity  at  -20  °C 

0.0920 

0.0916 

Viscosity  at  -40  °C 

0.0995 

0.0996 

Viscosity  at  20  °C 

0.136 

0.134 

Thermal  stability 

0.165 

0.163 

Olefins,  D13 19 

0.194 

0.192 

Naphthalenes 

0.319 

0.329 

Total  sulfur 

0.537 

0.653 

Acid  number 

0.545 

0.545 

Olefins,  D1159/D27 

0.605 

0.576 

Conductivity 

0.840 

0.909 

Table  7.  Summary  of  PLS  and  PCR  calibration  results.  The  lowest  observed  RMSECV  for  each 
property  (normalized  to  mean  property  value)  is  displayed  and  these  results  are  sorted  from 
lowest  to  highest  of  the  PCR  results. 


A  potential  route  to  address  this  issue  is  through  the  use  of  higher-order  analytical 
instrumentation  and  multi-way  chemometric  algorithms.27"32  These  techniques,  in  principal, 
allow  for  the  deconvolution  of  independently  varying  components  from  sample  to  sample, 
regardless  of  the  presence  of  unknown  interfering  chemical  species,  thus  vastly  increasing  the 
selectivity  and  power  of  the  entire  analytical  method.  It  has  been  shown  that  expanding  fuel 
analysis  to  a  three  dimensional  analytical  technique  such  as  gas  chromatography  with  mass 
spectrometry  detection,  reduced  model  dependence  on  fuel  source  by  allowing  the  elucidation  of 
trace  level  changes  in  fuel  composition  down  to  the  individual  compound  level.  This  type  of 
analyses  could  provide  a  means  to  characterize  compositional  changes  in  fuels  associated  with 
degradation  to  an  unprecedented  level  of  detail. 
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Property 

Ratio 

Property 

Ratio 

Density  at  60  °F 

0.67 

Viscosity  at  20  °C 

Flash  point  D93 

1.04 

Viscosity  at  -20  °C 

Flash  point  D3828 

1.02 

Viscosity  at  -40  °C 

Freezing  point 

1.02 

Refractive  index 

Pour  point 

0.93 

Conductivity 

Total  sulfur 

0.82 

Acid  number 

Naphthalenes 

0.97 

Thermal  stability 

1.01 

Aromatics,  D 1 3 1 9 

0.85 

Lubricity  scar 

0.87 

Aromatics,  D6379 

0.92 

Dist  IBP 

0.99 

Saturates 

0.90 

Dist  10% 

0.94 

Olefins,  D1159/D27 

1.05 

Dist  20% 

0.97 

Olefins,  D13 19 

1.01 

Dist  50% 

0.93 

Hydrogen  content 

1.04 

Dist  90% 

1.05 

Sp.  heat  cap.  at  0  °C 

0.85 

DIST  FBP 

1.01 

Table  8.  Comparison  of  PLS  versus  PCR  model  performance.  Listed  are  the  ratios  of  lowest 
PLS  model  RMSECV  to  lowest  PCR  model  RMSECV  for  each  fuel  property 


roperty 

PLS 

PCR 

Property 

PLS 

PCR 

Density  at  60  °F 

10 

10 

Viscosity  at  20  °C 

2 

6 

Flash  point  D93 

3 

9 

Viscosity  at  -20  °C 

1 

1 

Flash  point  D3828 

4 

9 

Viscosity  at  -40  °C 

3 

7 

Freezing  point 

5 

10 

Refractive  index 

7 

8 

Pour  point 

10 

7 

Conductivity 

3 

4 

Total  sulfur 

10 

2 

Acid  number 

1 

1 

Naphthalenes 

8 

5 

Thermal  stability 

1 

1 

Aromatics,  D1319 

8 

8 

Lubricity  scar 

8 

5 

Aromatics,  D6379 

10 

9 

Dist  IBP 

3 

7 

Saturates 

8 

8 

Dist  10% 

9 

Olefins,  D1159/D27 

6 

7 

Dist  20% 

9 

Olefins,  D13 19 

1 

3 

Dist  50% 

9 

Hydrogen 

7 

8 

Dist  90% 

9 

Sp.  heat  cap.  at  0  °C 

6 

4 

DIST  FBP 

9 

Table  9.  Number  of  latent  variables  utilized  in  best  PLS  and  PCR  models. 
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Property 

ASTM  Spec. 

GC 

NIR 

Raman 

Refractive  index 

D1218 

0.0009 

0.0009 

0.0013 

Density  at  60  °F 

D4052 

0.0038 

0.0032 

0.0022 

Flash  point 

D93 

0.0597 

0.0734 

0.0747 

Flash  point 

D3828 

0.0615 

0.0652 

0.0627 

Freeze  point 

D5972 

0.0570 

0.0699 

0.0810 

Pour  point 

D5949 

0.0298 

0.0394 

0.0322 

Naphthalenes 

D1840 

0.4321 

0.3700 

0.3191 

Aromatics 

D1319 

0.0739 

0.0367 

0.0513 

Aromatics 

D6379 

0.0759 

0.0329 

0.0559 

Saturates 

D1319 

0.0170 

0.0093 

0.0149 

Olefins 

D1159/D27 

0.6148 

0.6051 

0.9586 

Olefins 

D1319 

0.1947 

0.2116 

0.2980 

Lubricity,  Bocle 

D5001 

0.0540 

0.0607 

0.0618 

Viscosity  at  20  °C 

D445 

0.1364 

0.1424 

0.1682 

Viscosity  at  -20  °C 

D445 

0.0920 

0.1131 

0.1114 

Viscosity  at  -40  °C 

D445 

0.0996 

0.1349 

0.1158 

Acid  number 

D3242 

0.6173 

0.6135 

0.5459 

Thermal  stability 

D3241 

0.1655 

0.1652 

0.2064 

Mercaptan  sulfur 

D3227 

0.9679 

0.6527 

0.5378 

Sp.  heat  cap.  at  0  °C 

El  269 

0.0597 

0.0734 

0.0747 

Conductivity 

D2624 

0.972 

0.8408 

1.0748 

Acid  number 

D3242 

0.6173 

0.6135 

0.5459 

Distillation  IBP 

D86 

0.0303 

0.0345 

0.0357 

Distillation  10% 

D86 

0.0164 

0.0265 

0.0225 

Distillation  50% 

D86 

0.0268 

0.0227 

0.0386 

Distillation  FBP 

D86 

0.0303 

0.0349 

0.0292 

Table  10.  Comparison  of  model  performance  according  to  analytical  method  used.  Lowest 
RMSECV  values  for  each  property  (normalized  to  mean  property  value)  are  listed  in  bold  type. 


It  has  also  been  shown  that  a  key  to  successful  chemometric  analysis  of  complex 
chromatographic  fuel  data  is  the  implementation  of  an  ANOVA  calculation  performed  at  every 
time-resolved  data  point.  This  serves  as  a  powerful  feature  selection  utility  to  locate  and  extract 
chemically  relevant  data  from  much  larger  and  complex  GC-MS  datasets.  This  extracted  subset 
of  relevant  information  was  referred  to  in  this  paper  as  the  feature  selected  data.  PARAFAC 
analysis  (a  multi-way  factor  analysis  algorithm)  of  the  GC-MS  data  cube,  after  ANOVA  feature 
selection,  was  shown,  in  principal,  to  be  a  successful  route  to  elucidating  chemical  changes  down 
to  the  individual  compound  level.  This  capability  provides  for  a  wide  set  of  opportunities  for 
significant  improvements  in  fuel  diagnostics  and  analysis.  For  example,  by  comparing  GC-MS 
datasets  from  a  fuel  before  and  after  thermal  stress  in  this  manner,  it  would  be  possible  to  obtain 
structural  information  on  only  those  molecular  species  that  changed  in  response  to  thermal  stress. 
In  principal,  this  will  provide  an  excellent  starting  place  to  begin  to  characterize  and/or  validate 
likely  mechanisms  of  fuel  degradation.  By  coupling  the  chemometric  algorithm’s  output  to  an 
electronic  NIST  mass  spectrum  database,  it  is  possible  to  automatically  generate  a  list  of 
compounds  that  have  either  increased  or  decreased  in  concentration  in  response  to  a  given 
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stimulus.  The  utility  of  this  approach  was  demonstrated  with  two  experiments,  where 
compositional  changes  were  detected  in  Naval  distillate  fuel  during  mild  thermal  stress,  and  how 
the  evaporative  losses  were  detected  in  vented  bottle  testing. 

Chemometric  analysis  of  a  Naval  distillate  fuel  subjected  to  thermal  stress  in  the  presence  of 
catalytic  copper,  in  a  closed  system,  was  shown  to  result  in  the  same  overall  chemical 
composition  as  the  same  fuel  stressed  without  copper.  This  indicated  that  in  this  instance,  that 
while  copper  accelerated  the  autoxidation  of  the  fuel,  the  reaction  mechanisms  to  produce  the 
final  composition  were  similar  as  in  the  neat  fuel. 

Comparison  of  the  GC-MS  of  an  adulterated  or  blended  fuel  with  that  from  the  unadulterated 
fuel,  can  also  provide  an  extremely  sensitive  identification  of  compounds  in  the  adulterant  or  co¬ 
blended  fuel.  The  feature  selected  chromatogram  of  the  fuel  adulterant  can  then  be  input  to  the 
appropriate  PCA  model  to  reveal  the  likely  type  of  fuel  that  has  been  commingled  with  the 
original  sample.  This  also  can  be  used  to  selectively  identify  fuel  adulterants  that  may  have  been 
added  to  the  fuel  either  purposely  or  by  accident,  before  acceptance  for  use.  This  was 
demonstrated  in  this  study  in  the  discrimination  and  quantification  of  trace  levels  of  home  heating 
oil,  in  a  compositionally  similar  diesel  fuel  matrix. 


5.2  Development  of  A  Sensor-Based  Fuel  Quality  Assessment  Capability 

A  sensor-based  fuel  quality  technology  is  based  on  relating  a  numerical  representation  of  fuel 
composition,  to  the  properties  of  interest.  There  are  two  major  aspects  to  this  approach;  the  type 
of  sensor  and  its  response  to  compositional  features,  and  the  chemometric  models  that  predict  the 
properties  from  the  sensor  data.  In  this  study,  we  have  demonstrated  the  feasibility  of  predicting 
of  a  number  of  critical  fuel  properties  from  compositional  data.  Reasonable  property  predictions 
were  obtained  from  a  small  (46  sample)  training  set  from  capillary  gas  chromatography,  NIR  and 
Raman  spectroscopy.  The  advantages  of  optical  sensors  over  chromatographic  devices  for  field 
use  were  discussed  above,  and  thus  the  majority  of  this  facet  of  the  study  was  focused  on 
spectroscopy. 

In  assessing  the  accuracy  of  the  chemometric  models  to  predict  a  particular  fuel  property,  the 
most  obvious  way  is  to  directly  compare  the  predicted  and  measured  values.  However,  the 
development  of  a  global  parametric  measurement  of  error  is  complicated  by  the  fact  that  the 
standard  test  methods  to  which  the  chemometric  predictions  are  being  compared  to,  also  contain 
error.  A  common  way  to  express  the  errors  of  prediction  by  a  chemometric  model  is  to  compute 
the  root  mean  square  error  of  prediction  (RMSEP)  of  a  prediction  subset  of  available  data 
utilizing  a  model  constructed  from  a  separate  “calibration”  subset  of  data.33  In  this  preliminary 
work,  however,  the  training  set  consisted  of  only  46  jet  fuel  samples,  necessitating  model 
evaluation  through  “leave-one-out”  cross  validation,  in  which  the  prediction  of  a  given  sample  is 
made  utilizing  a  model  generated  from  the  remaining  samples.  Thus,  when  comparing  predicted 
vs  measured  values  for  a  property  of  interest,  the  root  mean  square  error  of  cross-validation  was 
used  instead  of  the  RMSEP.  For  the  purpose  of  comparing  the  errors  of  prediction  across  the 
various  fuel  property  models,  the  RMSECV  values  were  normalized  to  the  mean  of  the  predicted 
values. 

The  results  of  this  initial  study  were  successful  in  producing  models  with  RMSECV  values 
for  the  critical  fuel  properties,  i.e.,  flash  point,  density,  freeze  point,  aromatics,  and  viscosity,  that 
were  less  than  10%  of  observed  mean  property  values  within  the  training  set  used,  and 
commensurate  with  uncertainties  expected  from  the  ASTM  methods  utilized  to  acquire  reference 
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values  for  this  study.  This  indicates  that  chemometric  prediction  of  fuel  properties  from 
compositional  data  is  feasible,  although  much  more  work  must  be  done  to  validate  the  robustness 
of  this  approach  as  well  as  to  develop  rigorous  implementation  guidelines  and  requirements. 

Comparison  of  GC,  NIR  and  Raman  spectroscopy  indicates  that  a  fuel  quality  sensor 
platform  based  on  either  NIR  or  Raman  spectral  data  should  perform  acceptably,  although  GC 
could  provide  some  improvement,  particularly  in  the  prediction  of  freeze  point  and  viscosity.  The 
prediction  of  fuel  system  icing  inhibitor  (FSII)  remains  to  be  predicted  adequately  by 
spectroscopy  without  any  type  of  sample  preparation.  It  appears  that  the  structural  features  of  the 
FSII  additive  (di-ethyleneglycol  monomethyl  ether)  that  are  detectable  by  spectroscopy  may  not 
be  sufficiently  distinct  from  other  fuel  constituents  to  model  chemometrically  in  fuel.  Future 
studies  will  be  directed  towards  the  exploration  of  specialized  mathematical  treatments  and 
potentially,  alternate  analytical  methodologies  to  accomplish  this.  For  example,  a  specialized 
chromatographic  method,  either  alone  or  in  combination  with  spectroscopy,  may  offer  a  pathway 
towards  the  successful  direct  measurement  of  FSII  in  jet  fuel. 

Taking  this  pathway  a  bit  further,  it  is  expected  that  data  fusion  techniques  in  which  several 
types  of  analytical  data  can  be  combined  in  a  single  predictive  model  have  the  potential  to  offer 
distinct  advantages  over  each  measurement  technique  alone,  in  a  manner  analogous  to  the 
advantages  offered  by  multivariate  instrumentation  (e.g.  spectroscopy)  over  univarate 
instrumentation  (e.g.  single-point  sensors).  Thus,  a  model  that  combines  both  chromatography 
and  spectroscopic  data  may  be  capable  of  accurately  measuring  FSII  content,  along  with  multiple 
other  properties  that  are  not  currently  amenable  to  chemometric  modeling  from  spectroscopy 
alone.  Taking  this  approach,  the  measurement  of  a  broad  range  of  fuel  properties,  including  those 
that  are  not  amenable  to  chemometric  prediction  from  GC  or  NIR  data,  such  as  conductivity,  total 
sulfur,  and  acid  number,  would  in  theory,  be  attainable  with  a  NIR-based  fuel  quality  assessment 
system  that  is  augmented  with  a  set  of  carefully  chosen  specialized  sensors. 

To  achieve  reliable,  robust  prediction  of  these  properties  in  practice,  a  calibration  set  of  fuels 
must  be  assembled  that  accurately  reflects  the  span  of  levels  at  which  each  of  these  components  is 
present  in  the  range  and  type  of  fuels  that  are  to  be  evaluated.  Current  work  in  this  area  is  being 
undertaken  to  broaden  the  scope  of  the  fuel  training  set.  It  is  also  a  goal  of  this  program  to 
produce  a  stand-alone  software  implementation  of  these  chemometric  models,  which  would 
provide  the  means  to  export  this  to  other  fuel  laboratories  for  use.  The  software  would  be 
constructed  to  so  as  to  allow  the  user  to  update  the  models  as  new  data  are  obtained. 

A  major  caveat  to  these  conclusions,  however,  lies  in  the  limited  nature  of  this  feasibility 
study.  Most  of  the  fuel  properties  examined  arise  from  complex  interactions  between  many 
different  fuel  constituents.  To  achieve  reliable,  robust  prediction  of  these  properties  in  practice,  a 
calibration  set  of  fuels  must  be  assembled  that  accurately  reflects  the  span  of  levels  at  which  each 
of  these  components  is  present  in  the  range  and  type  of  fuels  that  are  to  be  evaluated.  Naturally, 
this  leads  to  a  requirement  of  hundreds,  if  not  thousands  of  calibration  samples  to  achieve 
predictions  of  high  accuracy  and  precision,  as  well  as  to  an  implicit  requirement  that  the 
instrumentation  used  to  obtain  the  calibration  data  is  of  high  reliability  and  precision.  This 
phenomenon  has  been  observed  in  work  done  by  others  in  the  prediction  of  octane  number  of 
gasoline  via  chemometric  regression  on  NIR  spectra.  Thus,  further  work  involving  wider  sample 
sets  and  increased  replicates  is  required  to  fully  ascertain  the  limitations  of  this  method. 
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